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1 Introduction 

The purpose of these notes is to provide an overview of some aspects of optimal and robust 
control theory considered relevant to quantum control. The notes begin with classical 
deterministic optimal control, move through classical stochastic and robust control, and 
conclude with quantum feedback control. Optimal control theory is a systematic approach 
to controller design whereby the desired performance objectives are encoded in a cost 
function, which is subsequently optimized to determine the desired controller. Robust 
control theory aims to enhance the robustness (ability to withstand, to some extent, 
uncertainty, errors, etc) of controller designs by explicitly including uncertainty models 
in the design process. Some of the material is in continuous time, while other material 
is written in discrete time. There are two underlying and universal themes in the notes: 
dynamic programming and filtering. 

Dynamic programming is one of the two fundamental tools of optimal control, the other 
being Pontryagin's principle, [24]. Dynamic programming is a means by which candidate 
optimal controls can be verified optimal. The procedure is to find a suitable solution to 
a dynamic programming equation (DPE), which encodes the optimal performance, and to 
use it to compare the performance of a candidate optimal control. Candidate controls 
may be determined from Pontryagin's principle, or directly from the solution to the DPE. 
In general it is difficult to solve DPEs. Explicit solutions exist in cases like the linear 
quadratic regulator, but in general approximations must usually be sought. In addition, 
there are some technical complications regarding the DPE. In continuous time, the DPE 
is a nonlinear PDE, commonly called the Hamilton- J acobi- Bellman (HJB) equation. The 
complications concern differentiability, or lackthereof, and occur even in "simple" classical 
deterministic problems, section 2. This is one reason it can be helpful to work in discrete 
time, where such regularity issues are much simpler (another reason for working in discrete 
time is to facilitate digital implementation). 



3 



Filtering concerns the processing of measurement information. In optimal control, 
filters are used to represent information about the system and control problem of interest. 
In general, this information is incomplete, i.e. the state is typically not fully accessible, 
and may be corrupted by noise. To solve optimal control problems in these situations, 
the cost function is expressed in terms of the state of a suitably chosen filter, which is 
often called an information state. Dynamic programming can then be applied using the 
information state dynamics. The nature of the measurements and the purpose for which 
the data is to be used determine the architecture of the filter. In stochastic situations, 
this is closely linked to the probabilistic concept of conditional expectation. The famous 
Kalman filter computes dynamically conditional expectations (of states given measure- 
ments in linear gaussian models), which are also optimal estimates in the mean square 
error sense. The quantum Belavkin filter, or stochastic master equation, also computes 
a quantum version of conditional expectation. In linear gaussian cases, the information 
states are gaussian, a fact which considerably simplifies matters due to the finite num- 
ber of parameters. Filters such as these based on computing conditional expectations of 
states or system variables do not include any information about the cost or performance 
objective. While this is not an issue for many problems such as LQG, where the task 
of estimation can be completely decoupled from that of control [17], there are important 
problems where the filter dynamics must be modified to take into account the control 
objective. These problems include LEQG [48, 49] or risk-sensitive control [8, 37], and H°° 
robust control [19, 54]. 

Figure 1 shows a physical system being controlled in a feedback loop. The so-called 
separation structure of the controller is shown. The control values are computed in the box 
marked "control", using a function of the information state determined using dynamic 
programming. The information state, as has been mentioned, is the state of the filter 
whose dynamics are built into the box marked "filter". This structure embodies the two 
themes of these notes. 



input 



physical system 



output 



y 



control 



filter 



feedback controller 



Figure 1: Feedback controller showing the separation structure. 



These notes were assembled from various lecture notes and research papers, and so we 
apologize for the inevitable inconsistencies that resulted. 
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2 Deterministic Dynamic Programming and Viscos- 
ity Solutions 

References for this section include [24], [25], [3], [15]. 

2.1 Introduction 

2.1.1 Preamble 

Hamilton- J acobi (HJ) equations are nonlinear first-order partial differential equations of 
the form 

F(x,V(x),VV(x)) =0 (1) 

(one can also consider second-order equations but we do not do so here). V(x) (x e C 
R n ) is the unknown function to be solved for, and VV(i) = ( , . . . , ) denotes 
the gradient. F(x,v,X) is a nonlinear function. 

HJ equations have a long history, dating back at least to the calculus of variations 
of the 19th century, and HJ equations find wide application in science, engineering, etc. 
Perhaps surprisingly, it was only relatively recently that a satisfactory general notion 
of solutions for (1) became available, with the introduction of the concept of viscosity 
solution (Crandall-Lions, c. 1980). The difficulty, of course, is that solutions are not in 
general globally smooth (e.g. C 1 ). Solutions are often smooth in certain regions, in which 
the famous method of characteristics may be used to construct solutions. There are a 
number of other notions of solution available, such as encountered in non-smooth analysis 
(e.g. proximal solution), though we will not discuss them here. 

In Engineering our principal interest in HJ equations lies in their connection with 
optimal control (and games) via the dynamic programming methodology. The value func- 
tion is a solution to an HJ equation, and solutions of HJ equations can be used to test 
a controller for optimality, or perhaps to construct a feedback controller. In these notes 
we discuss dynamic programming and viscosity solutions in the context of two examples, 
and make some mention of the general theory. 

2.1.2 Optimal Control 

As a first and perhaps familiar example (e.g. LQR), let's consider a finite time horizon 
optimal control problem defined on a time interval [^o^i] : 

J*(t ,x ) = inf J(t ,x ,u(-)) (2) 

u(-) 

Here, x is the initial state at time t , and u(-) is the control; J (to, x , «(•)) represent the 
associated cost. 

To be specific, and to prepare us for dynamic programming, suppose one wants to 
minimize the cost functional 

J(t,x;u(-)) = J L(x(s),u(s))ds + i;(x(t 1 )), (3) 
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where x(-) is the solution of the initial value problem 

x(s) = f(x(s),u(s)), t<s<ti, 

x(t) 



(4) 



x. 



Here, t £ [£o,*i] is a "variable" initial time, u(-) is a control defined on [t,ti] taking values 
in, say, U C R m (U closed), and x(-) is the state trajectory in R™. We denote by W t , tl a 
space of admissible controls, containing at least the piecewise continuous controls. 
The value function is defined by 



V(t,x) = inf J(t,x;u(-)) 



(5) 



for (t,x) G [to,ti] x R™. The dynamic programming principle states that for every r e 
[Mi], 



V(t,x) 



inf 



L(x(s),u(s)) ds + V(r, x(r)) 



(we will prove this later on). From this, one can derive formally the equation 
d 



dt 



V(t,x) + H(x,V x V(t,x)) = in (t ,t!) x R n , 



with terminal data 



V(t u x) 

Here, the Hamiltonian is given by 



i/j(x) in R". 



H(x,X) = inf {\- f(x,v) + L(x,v)} 



(6) 

(7) 
(8) 

(9) 



The nonlinear first order PDE (7) is the dynamic programming PDE or Hamilton- Jacobi- 
Bellman (HJB) equation. The pair (7), (8) specify what is called a Cauchy problem, and 
can be viewed as a special case of (1) together with suitable boundary conditions, using 
O = (to,ti) x R ra . Notice that the Hamiltonian (9) is concave in the variable A (since it 
is the infimum of linear functions). 

Let us see how (7) is obtained. Set r = t + h, h > 0, and rearrange (6) to yield 



inf 

«(■) 



i(V(t + h, x(t + h)) - V(t, x)) + i 



t+h 



L(x(s),u(s)) ds 



0. 



If V and u(-) are sufficiently smooth, then 



y(V(t + h,x(t + h)) -V(t,x)) -> ^-V(x,t) + V x V(x,t) • f(x,u(t)) as /i -> 
n, at 



and 



1 f 

— / L(x(s), u(s)) ds — > L(x, as /i — > 0. 
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Combining these displays one is formally led to (7). A proof of (7) when V is sufficiently 
smooth requires a careful derivation of two inequalities which combine to give (7). Below 
we will prove that V is a viscosity solution of (7); in fact, the unique one satisfying the 
terminal condition (8). 

Verification. Let V(t,x) be a C 1 solution of (7), (8). Let u(-) G W tl)tl be any control. 
Then using (7) 

%V(t,x(t)) = iV^xit)) + VV(t,x(t))x(t) 

= |y(f , x(t)) + VV{t, x(t))f(x(t),u(t)) 
> -L(x(t),u(t)) 

Integrating, we get 

V(h, x(h)) - V(t , x ) > - [ L(x(t),u(t))dt 

Jto 

V(t ,x ) < J* 1 L(x(t), M (t))rft + y(t 1 ,x(t 1 )) 
= j; 1 L(x(t),u(t))dt + ^(xih)) 

using (8). This shows that V(to,xo) < V(t ,x ) (V is the value function defined by (5)). 
Now this same calculation for the control u(-) = u*(-) G U to ^ L satisfying 

u*(t) e a rgmm\v x V(t,x*(t))- f(x*(t),v) + L(x*(t),v)\, (10) 

V&J L } 

for t G where #*(•) is the corresponding state trajectory, gives 

V{t ,x ) = / 1 L(x*(t), M *(t))rft + V(a;*(t 1 )) 

Jt 

showing that in fact u* is optimal and V(t ,x ) = V(t ,x ). Indeed we have V — V in 
[t ,ti] x R™ by this arguement, and so we have shown that any smooth solution to (7), 
(8) must equal the value function - this is a uniqueness result. Unfortuneatly, in general 
there may be no such smooth solutions. 

Optimal feedback. The above calculations suggest how one might obtain an optimal 
feedback controller. To simplify a bit, suppose that 

U = R m , f(x,u) = f(x)+g(x)u, L(x,u) = £(x) + l\u\ 2 . 

Then evaluating the infimum in (9) gives 

u* = -g(x)'\' 

and 

H(x, A) = A/(:r) - \\g{x)g{x)y + £(x). 
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Hence the HJB equation can be written as 

+ W/ - \VVgg'VV + I = (11) 

with optimal feedback controller 

u*(t,x) = -g(x)VV(t,x)'. (12) 

This means that the optimal control «*(•) G U is given by 

u *(t) = u*(t,x*(t)), t Q <t<t v 

Of course, this makes sense only when V is sufficiently smooth. 

The equation (11) is sometimes refered to as a nonlinear Riccati equation. 
LQR. Take 

U = R m , f(x,u)=Ax + Bu, L(x,u) = l\x\ 2 + l\u\ 2 , i/j(x) = \x'^x. 
As a trial solution of (11) we use 

V(t,x) = \x'P(t)x, 
where P(t) > (symmetric) is to be determined. Now 

^-V{t,x) = \x'P{t)x, and VV(t,x) = x'P(t). 

Plugging these into (11) gives 

\x'P(t)x + x'P(t)Ax - \x'P(t)BB'P(t)x + \x'x = 0. 

Since this holds for all x G R™ we must have 

P(t) + A' P{t) + P(t)A - P{t)BB'P{t) + / = 0. (13) 

At time t = ti we have V(ti,x) = \x'^x, and so 

P(h) = (14) 

Therefore if there exists a C 1 solution P{t) to the Riccati differential equation (13) on 
[t ,ti] with terminal condition (14) we obtain a smooth solution \x'P(t)x to (7), (8), and 
as argued above the value function for the LQR problem is given by 

V(t,x) = \x'P(t)x. (15) 

The optimal feedback controller is given by 

u*(t,x) = -B'P{t)x. (16) 

This gives the optimal control «*(•) G U: 

u*(t) = -B'P(t)x*(t), £ (17) 
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2.1.3 Distance Function 



As another example, we consider the distance function d(x, dQ) to the boundary dfl of 
an open, bounded set Q C R n . In some ways the HJ equation for this function is simpler 
than that of the optimal control problem described above, and we can more easily explain 
viscosity solutions and issues of uniqueness, etc, in this context. 
The distance function is defined by 

d(x ) dVt)= inf \x — y\. (18) 

yedn 

Note that the infimum here is always attained, not necessarily uniquely, since dQ is 
compact and y ^ \x — y\ is continuous; denote by ir(x) C dQ the set of minimizing y. 
We write 

V(x) = d(x,dtt) (19) 

for simplicity, and consider V(x) as a function on the closed set Q. It can be verified that 
V(x) is a non-negative Lipschitz continuous function. In fact, we shall see that V is the 
unique continuous viscosity solution of 

|W| -1 = in n (20) 

satisfying the boundary condition 

7 = on an. (21) 

Equations (20) and (21) constitute a Dirichlet problem. 

Example 2.1 Q = (-1, 1) C R 1 . Here, OQ = {-1, 1} and U = [-1, 1]. Then 



V(x) 



1 + x if - 1 < x < 
1 - x if < x < 1 



which is Lipschitz continuous, and differentiable except at x — 0. At each point x ^ 
V solves the HJ equation (20), and V satisfies the boundary condition (21) (V(— 1) = 
v(l) = 0), see Figure 2. Note that ir(x) = -1 for -1 < x < 0, tt(x) = 1 for < x < 1, 
and 7r(0) = { — 1,1}. The Lipschitz function V 1 (x) — \x\ — 1 also satisfies (20) a.e. and 
(21); there are many other such functions. 

Dynamic programming. The distance function satisfies a simple version of the 
dynamic programming principle: for any r > we have 

V(x) = inf {\x-z\ +V(z)}. (22) 

\x—z\<r 

We will use this later to show that V is a viscosity solution of (20), but for now we discuss 
and derive (22). 
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Figure 2: Distance function V and another Lipschitz solution V 1 . 

Fix x E Q and r > 0, and let \x — z\ < r. Choose y*(z) £ n(z), so that V(z) = 
\z-y*(z)\. Then 

V(x) < \x-y*(z)\ 

< \x — z\ + \z — y*(z)\ 
= \x — z\ + V(z). 

Since this holds for all \x — z\ < r we have 

V{x) < inf +V(z)}. 

\x—z\<r 

To see that equality holds, simply take z — x. Thus establishes (22). Note that there are 
many minimizers z* for the RHS of (22), viz. segments of the lines joining x to points in 
7r(x). 

2.1.4 Viscosity Solutions 

We turn now to the concept of viscosity solution for the HJ equation (1). The terminology 
comes from the vanishing viscosity method, which finds a solution V of (1) as a limit 
V s — > V of solutions to 

--AV £ (x) + F(x, V £ (x), VV £ (x)) = (23) 
2 

The Laplacian term f AV e = | Y^=i J^^ e can be used to model fluid viscosity. The 
definition below is quite independent of this limiting construction, and is closely related 
to dynamic programming; however, the definition applies also to equations that do not 
necessarily correspond to optimal control. 
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A function V G C(Q) is a viscosity subsolution of (1) if, for any G C 1 (f2) and any 
local maximum xq G f2 of V — we have 

F(io,^o),V^o))<0 (24) 

A function V G C(Q) is a viscosity supersolution of (1) if, for any G C 1 (f2) and any 
local minimum xq G f2 of V — we have 

F(xo,y(x o ),V0(x o )) >0 (25) 

A function V G C(f2) is a viscosity solution of (1) if it is both a subsolution and a 
supersolution. 

This definition may at first sight appear strange, though in practice it is often easy to 
use. Note that derivatives in (24) and (25) appear only on the smooth function 0. There 
are a number of equivalent formulations, and the key point is that the definitions relate 
sub- or superdifferentials (of functions which need not be differentiable) to inequalities 
associated with the HJ equation. 

The superdifferential of a function V G C(Q) is defined by 

D + V(x) = {A G R™ : hmsup V{v) - V{x) _ - X(V - %) < 0} (26) 
The subdifferential of a function V G C(fi) is defined by 

D-V(x) = {A G R™ : liminf V M~ V W > 0} (27) 

y->x,yen \x — y\ 

UVe C\Q) then D + V(x) = D~V(x) = {VV(x)}. In general, A G D+V(x) iff 
there exists G C 1 (fi) such that V0(x) = A and V — has a local maximum at x; and 
A G Z^V^rr) iff there exists G C 1 ^) such that V0(x) = A and V — has a local 
minimum at x. 

Therefore the viscosity definition is equivalently characterized by 

F(x, V(x), A) < V AG D + V(x) (28) 

and 

F(x,V(x),X) > V A G D~V(x) (29) 
Example 2.2 Continuing with Example 2.1, we see that 



D + V(x) = D~V(x) = 

D + V(0) =[-1,1], 
D~V(0) = 0. 



{1} if -Ki<0 
{-1} if < x < 1 



(30) 



Consequently V is a viscosity solution of (20). 

However, the function V 1 is not is viscosity solution, since G D^V 1 (0) = [—1,1], and 

1 1 - 1 £ 0. 
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Properties. Some properties of viscosity solutions: 

1. (Consistency.) If V G C(ft) is a viscosity solution of (1), then for any point :r G ft 
at which V is differentiable we have 

F(x,V(x),VV(x)) = 0. 

2. If V is locally Lipschitz continuous in ft, then 

F(x,V{x),W{x)) = a.e. in ft. 

3. (Stability.) Let V N G C(ft) (iV > 0) be viscosity solutions of 

F N (x,V N (x),VV N (x)) = in ft, 

and assume — > V locally uniformly in ft, and F N — > F locally uniformly in 
ft x R x R n , as N — > oo. Then F G C(ft) is a viscosity solution of (1). 

4. (Monotonic change of variable.) Let V G C(ft) be a viscosity solution of (1) and 
* G C^R) be such that $'(t) > 0. Then W = $(V) is a viscosity solution of 

V(W(x)), V'(W(x))VW(x)) = (31) 

where \1> = 



2.2 Value Functions are Viscosity Solutions 
2.2.1 The Distance Function is a Viscosity Solution 

We showed in Example 2.2 that in the specific case at hand the distance function is a 
viscosity solution. Let's now consider the general case. We use the dynamic programming 
principle (22) to illustrate a general methodology. 

Subsolution property. Let <fi G C 1 (ft) and suppose that V — (ft attains a local maximum 
at x G ft; so there exists r > such that the ball B(x ,r) C ft and 

V(x) - (ft{x) < V(x ) - (ft(x ) \/xeB(x ,r). (32) 

We want to show that 

|W(x o )|-l<0. (33) 

Let h G R n , and set x = x + th. Then for t > sufficiently small x G B(x ,r), and 
so from (32), 

-{<f>(x + th) - <j>(x )) < -{V{x + th) - V(x )) (34) 
Now from the dynamic programming principle (22) we have 

V(x ) <t\h\ + V(x + th) (35) 
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Combining (34) and (35) we find that 

-(<Kx + th)-<t>(x ))<t\h\, (36) 

and so 

_ ( ^o+|pM) _ i < o (37) 

Send 1 1 to obtain 

-^V<f>(x )h-1 <0. (38) 

Since /i G R n is arbitrary we obtain (32) as required. This proves that V is a viscosity 
subsolution. 

Supersolution property. Let G C 1 (i7) and suppose that V"— attains a local minimum 
at x G f2; so there exists r > such that the ball B(x ,r) C fi and 

V(ar) - 0(x) > y(x ) - 0(z o ) Vie%,r). (39) 

We want to show that 

|W(xo)|-l>0. (40) 
Suppose that (40) is false, so that 

|V0(x o )| - 1 < -a < (41) 

for some 1 > a > 0. By making r > smaller if necessary, we may assume 

\V(j)(x)\ - 1 < -a/2 < V x E B(x ,r). (42) 

By the fundamental theorem of calculus, we have 



(x) = (f)(x ) + / V(f)(-fx + (1 - 7)x )(x - x )d'y 
Jo 



(43) 



Now from the dynamic programming relation (22), pick z* G B(x ,r), z* ^ x , such 
that 

V(x ) = \x -z*\ + V(z*). (44) 

Using this and (39) we have 

-(0(z*)-0(rro))>k*-zo|. (45) 

However, from (42) and (43) we must have 

-(<j>(z*) - <f>(x )) < (1 - a/2)|^ - x |. (46) 

Inequalities (45) and (46) are in contradiction, so in fact (40) holds. This proves that V 
is a supersolution. 

It can be seen here that the dynamic programming principle provided the key inequal- 
ities to derive the sub- and supersolution relations. 
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2.2.2 The Optimal Control Value Function is a Viscosity Solution 



Dynamic programming. The dynamic programming principle states that for every 

r g [t,ti], 



V(t,x) = inf 



L(x(s),u(s)) als + V(r, x(r)) 



(6) 



We now prove this (by a standard technique in optimal control). 

Fix r G [Mi], an d u(-) G Ut,t\- Let x(-) denote the corresponding trajectory with 
initial state x(t) = x, and consider (r,x(r)). Let e > and choose Ui(-) G U r>tl , with 
trajectory on [r,ti] with Xi(r) = x(r) be such that 



V(r, x(r)) > J(r, x(r); ~~ £ - 



Define 



u 2 (s) 



u(s) t < s < r 
Ui(s) r < s < t\ 



(47) 



(48) 



with trajectory a^O, x i{t) = x. Now X2(s) = x(s), s G [t, r], and x 2 (s) = #i(s), s G [r, ti]. 
Next, 

V(r,x) < J(t,x;u 2 (-)) 

= J t t t 1 L(x(s),u(s))ds + ^(x(t 1 )) 
= f r L(x(s),u(s))ds+ r t r L(xi(s),^i(s))ds + V'(^(ii)) 
= f* L(x(s),u(s))ds + V(r,x(r)) + e 

using (47). Therefore 



(49) 



V(t,x)< inf 

«(-)SWt,r 

Since 5 > was arbitrary, we have 



V(t,x)< inf 

u(-)6W t/ . 



/r 
L(x(s), -u(s)) (is + V(r, x(r)) 



+ e. 



(50) 



L(x(s),u(s)) ds + V(r, x(r)) 



(51) 



This proves one half of (6). 

For the second half of (6), let u(-) G Ut,ti, an d let x (') be the corresponding trajectory 
with x(t) = x. Then 

J(t,x;u(-)) = f* 1 L(x(s),u(s)) ds + ^(xih)) 

= J t r L(x(s),u(s)) ds + X* 1 L(x(s),u(s)) ds + ^(h)) (52) 
> f{ L(x(s),u(s))ds + V(r,x(r)). 

Now minimizing, we obtain 



V(t,x) = inf u( . )6Wt(i J(t,x;u(-)) 

> inf u (.)ew t , tl {f t L(x(s),u(s)) ds + V(r, x(r))} 



(53) 
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which is the desired second half of (6). This establishes the dynamic programming prin- 
ciple (6). 

Regularity. By regularity we mean the degree of continuity or differentiability; i.e. of 
smoothness. The regularity of value functions is determined by both the regularity of the 
data defining it (e.g. /, L, ip), and on the nature of the optimization problem. In many 
applications, the value function can readily be shown to be continuous, even Lipschitz, but 
not C 1 in general. The finite horizon value function V(t,x) defined by (5) can be shown 
to be bounded and Lipschitz continuous under the following (rather strong) assumptions 
on the problem data: /, L, ip are bounded with bounded first order derivatives. We shall 
assume this. 

It should be noted that in general it can happen that value functions fail to be con- 
tinuous. In fact, the viscosity theory is capable of dealing with semicontinuous or even 
only locally bounded functions. 

Viscosity solution. Let us re-write the HJ equation (7) as follows: 

-^-V(t,x) + H(x,V x V(t,x)) = in(i ,ti) xR n , (7)' 
with a new definition of the Hamiltonian 

H(x,X) = sup {-X-f(x,v) -L(x,v)}. (9)' 
t>eR m 

The sign convention used in (7)' relates to the maximum principle in PDE, and is compat- 
ible with the convention used for the general HJ equation (1). Note that the Hamiltonian 
is now convex in A. 

A function V G C([i ,£i] x R n ) is a viscosity subsolution (resp. supersolution) of (7)' 
if for all G C^o^i) x R n ), 

d 

-—<f){so,x ) + H{xo,V<i){so,xo)) < (resp. > 0) (54) 

at every point (sq,Xq) where V — attains a local maximum (resp. minimum). V is a 
viscosity solution if it is both a subsolution and a supersolution. 

We now show that the value function V defined by (5) is a viscosity solution of (7)'. 

Subsolution property. Let <j) G C 1 ((to,^i) x R n ) and suppose that V — <fi attains a local 
maximum at (so,xq); so there exists r > such that 

V(t, x) — <f>(t,x) < V(so,x ) — <f)(so,xo) V |x — x \ < r, \t — s \ < r. (55) 

Fix u(t) — u G U for all t (constant control) and let £(•) denote the corresponding state 
trajectory with £(so) — x o- By standard ODE estimates, we have 

\Z(s + h) -x Q \ < r (56) 

for all < h < h (some ho > 0) - since U and / are bounded. Then by (55) 

V(s + h, £(s + h)) - 0(s o + h, e(s + h)) < V{s , x ) - 0(s o , x ) (57) 
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for all < h < ho- Now from the dynamic programming principle (6) we have 

V(s ,x )< f° L(Z{s),u(s))ds + V(s + h,Z(s + h)) (58) 

J so 



i s 

which with (57) implies 

,<j>(so + h,c(s + h))-<f>( So ,x ))^ i r° +h 

Send h — > to obtain 

d 



.( ^o + ft.^o + A))-^^)) ) ^jT^^W-))* < 0. (59) 



% 0(s o , x ) - V0(s o , x )f(x , u) - L(x , u) < 0. (60) 

Now maximize over u to obtain 
d 

-— 0(s o , x ) + sup{-V0(s o , x )f(x , u) - L(x , u)} < 0. (61) 
ueu 

This proves that V is a viscosity subsolution. 

Supersolution property. Let G C 1 ((to,^i) x R n ) and suppose that V — (ft attains a 
local minimum at (so,xo); so there exists r > such that 

V(t,x) - <f>(t,x) >V(s ,x ) - <f>(s Q ,x Q ) V\x-x \<r, \t - s \ < r. (62) 

Again by ODE estimates, there exists h > such that 

\£(s + h) -x \ < r (63) 

for all < h < ho and all u(-) G U Sihtl , where £(•) denotes the corresponding state 
trajectory with £(so) = x . 

Assume the supersolution property is false, i.e. there exists a > such that 

d 

-ttMso.Xq) + sup{-V<f)(s ,xo)f(x ,u) - L(x ,u)} < -3ah < 0, (64) 

Ot U (zu 

where < h < h . Now (64) implies 

-J^WOO) - V<f>(s^(s))ms),u(s)) -L(Z(s),u(s)) < -2ah<0, (65) 

for all s G [s , s + h] and all «(•) G £4 0)tl , for /i > sufficiently small. 

By the dynamic programming formula (6), there exists Wo( - ) G W SOjtl such that 

rso+h 

V(s ,x ) > / £(£o(s), «o(s)) rfs + ^(«o + K£ (s + h)) - ah (66) 

J So 
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where £o(") denotes the corresponding trajectory with £(so) — x o, and combining this 
with (62) we have 

(f)(s + h^ (s + h)) -0(g o ,x o )) 1 f S0+h T(c( s ( » , . , fi7 x 
-( ^ ) - ^ y L(^o(s),M (s))rfs > (67) 

However, integration of (65) implies 

-( ^ ) - ^ y L(^o(s),M (s))ds < -2a. (68) 

which contradicts (67) since a > 0. This proves the supersolution property. 

2.3 Comparison and Uniqueness 

The most important features of the theory of viscosity solutions are the powerful com- 
parison and uniqueness theorems. Comparison theorems assert that inequalities holding 
on the boundary and/or terminal time also hold in the entire domain. Uniqueness follows 
from this. Such results are important, since they guarantee unique characterization of 
viscosity solutions, and ensure that convergent approximations converge to the correct 
limit. In the context of optimal control problems, value functions are the unique viscosity 
solutions. 

In this section we give a detailed proof of the comparison and uniqueness results for a 
class of Dirichlet problems, and apply this to equation (20) for the distance function. We 
also present without proof results for Cauchy problems of the type (7), (8). 

2.3.1 Dirichlet Problem 

Here we follow [3, Chapter II] and consider the HJ equation 

V(x) + H(x,VV(x)) = in SI, (69) 

a special case of (1). 

To help get a feel for the ideas, suppose V\, V 2 £ C(fi)nC 1 (fi) (i.e. are smooth) satisfy 

V^x) + H(x, Wi(z)) < (subsolution) , , 

V2(x) + H(x, VV^x)) >0 (supersolution) 

in Q and 

V\ < Vi on dfl (boundary condition). (71) 

Let Xq G f2 be a maximum point of V\ —V2. Now if xq e (interior, not on boundary) 
then VVi(xo) = VV^Xo) and subtracting the first second line of (70) from the first gives 

V 1 (x )-V 2 (x )<0 
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which implies 

Vi{x) - V 2 (x) < V 1 (x ) - V 2 {x ) < V xeU. 
If it happened that x G dQ, then using (71) 



(72) 



v 1 (x) - v 2 (x) < v 1 (x ) - v 2 (x ) < o v x e a 



(73) 




(74) 



To see this, suppose V\ and V 2 are two solutions. Now V\ = V 2 = ip on dQ. Then by the 
comparison result, we get V\ < V 2 in Q. Similarly, interchanging Vi, V 2 we again apply 
comparison to obtain V 2 < V\ in Q. Hence V\ = V 2 in Q. 

This illustrates the role of sub- and supersolutions and boundary conditions in the 
comparison and uniqueness theory. We now give a precise theorem and proof ([3, Theorem 
II.3.1]). This result does not use convexity or any connection to optimal control, and 
applies generally. 

Theorem 2.3 Let Q be a bounded open subset of~R n . Assume Vi,V 2 G C(Q) are, re- 
spectively, viscosity sub- and supersolution of (69), and satisfy the inequality (71) on the 
boundary. Assume that H satisfies 



for x,y G Q, A G R™, where lo\ : [0, +oo) — > [0, +oo) is continuous, nondecreasing with 
co>i(0) = (uj\ is called a modulus^). Then 



H(x,X)-H(y,X)\<cu 1 (\x-y\(l + \X\)), 



(75) 



Vi < V 2 in VI. 



(76) 



Proof. For e > define the continuous function on f2 x f2 by 



$ e (x,y) = V 1 (x)-V 2 (y) 



x-y 
2e 



Let (x £ , y £ ) G Vt x Vt be a maximum point for $ e over Vt x Vt. Then 



max(Vi —V 2 )(x) < max$ e (x,x) < max L § e (x,y) = § £ (x £ ,y £ ). 




(77) 




(78) 
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Let us prove (78). Now the fact that 

® £ (x £ ,X £ ) < $ £ (x £ ,y £ ) 

implies 

< V 2 (x £ ) - V 2 (y £ ) < C (79) 



Fe ~Ve\ 



2e 

for suitable C > (recall V 2 is bounded on f2), and so 

\x £ -y £ \ < {Ce) 1 ' 2 . (80) 

Therefore 

\%e - 2/e| -> as £ -> 0, (81) 
and by continuity, V 2 (x £ ) — V 2 (y £ ) — > as e — > 0; hence (79) gives 

|2 

^ as £ -> 0. (82) 



2£ 

We now need to consider where the points £ £ ,y £ lie. 

Case (i). Suppose x £ ,y £ G Q (both interior points), for all sufficiently small e > 0. Let 

^(y) = Vfa) - ^V^, = V 2 (y £ ) + (83) 

Now 0i,02 £ C 1 (^) ) x £ is a local maximum for V\ — 02, and y £ is a local minimum for 
V 2 — 0i- Also, 

V0!(y £ ) = = V0 2 (x £ ). (84) 



The viscosity sub- and supersolution definition implies 



V 1 (x £ ) + H(x £ ,^) <0, 
V 2 (y £ ) + H(y £ ,^) >0. 



(85) 



Subtracting we have 



Vi(s.) - V 2 (y £ ) + ff(s e , ^— ^) - tf(y £ , ^-^) < (86) 



and using the assumption on H 



Vfa) - V 2 (y £ ) < wi(|x e - y £ |(l + i^_J% (87) 



This implies 
and hence (78) follows using (81) and (82). 



$ £ (x £ ,y £ )<u; 1 (\x £ -y £ \{l+ lX£ Vel )), (88) 
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Case (ii). Now suppose there exists a subsequence — * as i — > oo such that x £i G 
or y £i G <9fi. If x £i G dfi, 

Vi(* e4 ) - V 2 (y £i ) < V 2 (x £i ) - V 2 (y £i ) -> (89) 

as i — > oo, or if y £i G <9f2, 

Vi(s ei ) - V 2 (y £i ) < V^) - V^) -> 

as i — > oo; hence (78). 

This completes the proof. 

The distance function is the unique viscosity solution of (20), (21). At first 
sight Theorem 2.3 does not apply to (20), (21). This is because equation (20) does not 
have the additive l^-term that (69) has, and this term was used in an essential way in the 
proof of Theorem 2.3. In fact, in general viscosity solutions to equations of the form 

H(x,VV) = (91) 

may not be unique! For instance, in the context of the Bounded Real Lemma both the 
available storage and required supply are viscosity solutions of equations of the type (91). 
It turns out that comparison/uniqueness for HJ equation (20) for the distance function 
can be proved, either directly using additional hypothesis (such as convexity [3, Theorem 
II. 5. 9]), or via a transformation as we now show. 

We use the Kruskov transformation, a useful trick. Define 

W = $(V) = l-e~ v , 
where V is the distance function (19). Then W is a viscosity solution of 

W(x) + \VV(x)\ -1=0 in n, 

if =0 on on, 

by the general properties of viscosity solutions mentioned above. By Theorem 2.3, we see 
that W is the unique viscosity solution of (93), and hence 

V = V(W)=$- 1 (W) = -log(l-W) (94) 
is the unique viscosity solution of (20), (21). Comparison also follows in the same way. 

2.3.2 Cauchy Problem 

In this section we simply state without proof an example of a comparison/uniqueness 
result, [3, Theorem III. 3. 15]. There are many results like this available, with various 
kinds of structural assumptions (e.g. (95), (96)) which must be checked in order to apply 
them. 
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(90) 



(92) 



(93) 



Theorem 2.4 Assume H is continuous and satisfies 

\H(x, Ai) - H(x, A 2 )| < X(l + |x|)|Ai - A 2 | (95) 

for all x, Ai, A2 G R n , and 

\H(x 1 ,X)-H(x 2 ,X)\ < cu(\x 1 -x 2 \,R)+uj(\x 1 -x 2 \\X\,R) (96) 

for all A G R n , xi,X2 € B(0,R), R > 0, where u is a modulus (depending on R). Let 
Vi, V 2 G C([to,ti] x R n ) be, respectively, viscosity sub- and supersolution of (7)' satisfying 

V 1 (t 1 ,x)<V 2 (h,x) V x G R n . (97) 

Then 

V!(t,x) < V 2 (t,x) V (t,x) G [f„,fi] x R ra . (98) 
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3 Stochastic Control 



References on stochastic control and probability theory include [38], [17], [10], [22], [24], 
[2], [9], [39], [1]. 

3.1 Some Probability Theory 

While probability theory, especially quantum probability, will be covered by other speakers 
at this workshop, we present in this subsection some basic definitions and ideas. 

3.1.1 Basic Definitions 

Classical probability theory considers events F, subsets of a sample space Q, and assigns 
a numerical value < P(F) < 1 to each event F indicating the probability of occurrence 
of F . The collection of all allowed events is denoted T . The basic construct of classical 
probability is the triple J 7 , P), called a classical probability space. 

To facilitate an adequate framework for integration (expectation), convergence, etc, 
there are a number of technical requirements placed on probability spaces. While the set 
Q of outcomes can be arbitrary (e.g. colors of balls in an urn, the set of real numbers, etc), 
the collection of allowed events T is required to be a cr-algebra. A a-algebra T contains 
the empty set (0 G JF), is closed under complements (F G T implies F c = {w G O : u £ 
F} G J 7 ), and is closed under countable unions ({-Fi}^ C T implies \J° =X -F G J 7 ). A 
pair (Q, J 7 ) is called a measurable space (on which one or more probability measures may 
be defined). 

A probability measure is a function P : J 7 — > [0, 1] such that (i) < P(F) < 1 for all 
F E J 7 , (ii) P(fi) = 1, and (iii) if F X ,F 2 , ... is a disjoint sequence of events in J 7 , then 

P(U=i^) = £ ? =iP(^)- 

In many cases, the set of outcomes Q will be a topological space, i.e. a set Q together 
with a collection r of subsets (the open sets), called a topology. A topology r contains 
the empty set, and is closed under arbitrary unions and intersections. If fl is discrete, 
then the set of all subsets defines the standard topology. If f2 = R or C (real or complex 
numbers), the standard topology can be defined by considering all open intervals or discs 
(and their arbitrary unions and intersections). Given a topological space (Q, r), the Borel 
(T-algebra B(Q) is the a-algebra generated by the open sets (it is the smallest a-algebra 
containing all open sets). Events in a Borel cr-algebra are called Borel sets. Often, when 
the topology or cr-algebra is clear, explicit mention of them is dropped from the notation. 

Let (Qi, T\) and (^2,^2) be two measurable spaces. A random variable or measurable 
junction is a function X : Q 1 — > Q 2 such that X~ 1 (F 2 ) = {ui G Q\ : X(u>i) G F 2 } G T\ 
for all F 2 G J 7 2 . In particular, a real-valued random variable X defined on is a 

function X : Q — > R such that X~ 1 (B) = {lo G Q : X(u) G B} G J 7 for any Borel set 
BcR. Similarly, we can consider complex- valued random variables. If P is a probability 
measure on (Q, J 7 ), the probability distribution induced by X is 

P X (B)=P(X-\B)), 
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so that (R, B(R),Px) is a probability space. If X has a density px(^), expectations of 
functions f(X) (e.g. moments) can be calculated via 



E[/(X)] = / /(x)Px(cte) = / f(x)p x (x)dx. (99) 

Note that the cumulative distribution function is given by Fx(x) = Px((— oo, x]), and 
the density px(x) — dFx{x)/dx (when it exists). 

Let Y be a random variable, and y = cr(Y) be the a-algebra generated by Y; i.e. 
the smallest a-algebra with respect to which Y is measurable. In general y C T . If 
Z is a random variable that is also ^-measurable, then there is a function such that 
Z(oj) = gz(Y(u)). Thus, Z is a function of the values of F. 

For < p < oo, the set L p (f2,jF, P) is the vector space of complex-valued random 
variables X such that E[|X| P ] is finite. It is a Banach space with respect to the norm 
|| X \\ p — E[|X| p ] 1 / p . The case p = 2 is of special interest, since L 2 (fi,^ r , P) is a Hilbert 
space with inner product 

(X,Y) = E[X*Y\. 

For p = oo, the space of essentially bounded random variables L°°(fi,jF, P) is a Banach 
space with norm || X \\oo— ess.supjX(u>)|. 

Example 3.1 Consider the classical probability space (f2,jF, P), where VI = {1,2}, T = 
{0, Q, {1}, {2}}, and P = (pi,P2)- A random variable X has the form X = (xi,x 2 ), where 
x±,X2 G C, and in this example, the spaces L°°(f2,jF, P) (equal to L 2 (fi,jF, P) as a set) 
consist of all such random variables. The expected value of X is given by 

ELY] = I X{u)V{du) = x lPl + x 2 p 2 . (100) 
Jn 

□ 



3.1.2 Conditional Expectations 

Let y C T be a sub-cr-algebra. The conditional expectation E[/(X)|y| of /(X) given ^ 
is the unique ^-measurable function such that 

E[f(X)I F ] = E[E f(X)\y]I F ] for all F G y. (101) 

Here, is the indicator function of the set F defined by If(oS) — 1 if uj G F, and 
If{w) = otherwise. 

This definition may seem abstract, and so we attempt to clarify it by describing what 
happens in the language of elementary probability and give examples. 

Recall from elementary probability the notion of conditional density px\y(x\y) of a 
random variable X given another random variable Y. It is given by 

Px\y(x\y) = -^y- (102) 
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where px,y{ x , y) is the joint density of X and Y and Py{v) is the density of Y. Using this 
conditional density conditional expectations can be computed by 



/CO 
f(x) PxlY (x\y)dx. (103) 
■oo 

Here, we have emphasized the fact that the conditional expectation is a function of the 
values y of the random variable Y. If y is the a-algebra generated by Y, then we write 
E[f(X)\y] = E[f(X)\Y}. Property (101) can be checked as follows: 



E[Ef([X)\y]I F ] = / Ef([X)\Y](y)p Y (y)dy 
>f 

fOO 



r poo 

= 11 f( x )Px\Y(x\y)p Y (y)dxdy 

J F J-oo 
p poo 

= 11 f{x)px,Y{x,y)dxdy 

J F J -oo 



J f(x)p x (x)dx 

W(x)i F }. 



Example 3.2 Let X and W be independent random variables with densities px(x) and 
Pw(w), respectively. Let Y be a random variable defined by 

Y = X + W. (104) 

The joint density of X and Y is px,Y (x, y) = px{x)pw{y—x), and this defines a probability 
measure P(dx) = px,y{x, y)dx on the Borel subsets of Q = Rx R. The conditional density 
is given explicitly by 

/ i \ px(x)p w (y-x) 
Px\Y{x\y) = -j , 

/ px(x)p w (y - x)dx 

which defines a conditional distribution n(y)(dx) = px\Y(x\y)dx. 

In the absence of measurement information, expectations of functions f(X) are eval- 
uated by 

(P,/)=E[/] = J f(x)P(dx), 

in accordance with (99). Measurement of Y provides information about X, allowing us to 
revise expectations using a conditional probability measure. Suppose we know that the 
values of Y occurred in a set F (F G y = cr(Y)). The revised or conditional probability 
measure is 

I F (y)p(x,y)dy 



n(F)(dx) = 1 j—^ dx, (105) 

p{F) 
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where p(F) = Prob(V G F). Then the conditional average of f(X) given F is 



(n(F)J) = J f(x)n(F)(dx) 

J J Pxy{x,y)f{x)dydx 



p(F) 
(n(y),f)dy 
~pjF) 



(106) 



Note that the last equality uses the invariance property (101). If A is a possible set of 
values of X (i.e. X G a{X)), with / = I A) then (106) reduces to the familiar elementary 
formula Prob(X G A\Y G F) = Prob(X G A,Y G F)/Prob(F G F). □ 



Example 3.3 Consider the (finite) discrete set Q = {1,2,3}. Let T be the standard 
discrete cx-algebra, namely all subsets T = {0, Q, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}}, and 
let P be the uniform distribution, P = (|, |, |). Let X = (2,4,9), i.e. X(l) = 2, 
X(2) = 4, X(3) = 9, and let Y = (1, 1, 4). Then y = a(Y) = {0, SI, {1, 2}, {3}}. Using 
the property (101) with F = {1, 2} we find that 

E[X\y](io) =3 forcjG {1,2} 

while F = {3} gives 

E[X\y](u) = 9 for iv G {3}. 

The random variable X has been averaged over the atoms {1,2}, {3} of y. The random 
variables Y and E[X|^] are constant on these atoms (y measurability) . The conditional 
expectation E[X|^] can be viewed as a function of the values y G {1, 4} of Y. Indeed, let 
g(l) = 3 and g(A) = 9. Then 

E[X\y\(u) = g(Y(u)). 
We write simply E[X|^](y) for g(y) (a slight abuse of notation). □ 



3.1.3 Stochastic Processes 

Heuristics. A stochastic process is a random function of time. e.g. Noise, Figure 3. 

Definition 3.4 A stochastic process {X„}^ is a sequence of random variables X, 
defined on (Q, JF, P). For each uj G VL, 

X(u) = {X (u),X 1 (u),...} 

denotes a sample path. 
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Figure 3: Sample paths of stochastic process X. 

If {X„}^L is real valued, then {X n }^ =0 can be viewed as a random variable with 
values in R N (path space; here N is the set of integers). 

It is important to quantify the information history or flow corresponding to a stochastic 
process. This is achieved using filtration. 

Definition 3.5 A filtration of a measurable space (Q, J 7 ) is a family {JF„}^ of sub-a- 
algebra T n d T such that 

T n <^T n +\ , Vn = 0, 1, ... 

Example 3.6 If {X n } is a stochastic process, 

Qn = cr(X 0: X , . . . , X n ) 
defines a filtration {Gn}, Qn C T called the history of {X n } or filtration generated by 

{X n }. ' □ 

Definition 3.7 A stochastic process {X n } is adapted to a filtration {J-'n} if X n is T n - 
measurable for all n = 0, 1, 2, . . .. 

Remark 3.8 We can consider T n are events that have occurred up to time n. 
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3.1.4 Martingales 

Definition 3.9 A stochastic process {X n } defined on (fi, JF, {jF n }, P) is called a super- 
martingale/martingale/ submartingale if 

1. {X n } is adapted to {.F n } 

2. E[\X n \\ < +oo 

3. £ , [X n+ i|J r n ] < X n / ElXn+xlFn] — X n / £ , [X n+ i|^ r n ] > X n , Vn. 

A martingale corresponds intuitively to a fair game of chance. 
Note: For m > n, 

E\X m \T n ] = £ , [X n+1 |J r ri ] 
= X n 

if {X n } is a martingale. In general, Vm > n, ElXmlFn] is the predicted value of X m given 
the history T n . 

A supermartingale (submartingale) decreases (increases) in conditional mean. 
Example 3.10 Let {Fn} be a filtration on (Q, JF) and define 

J'oo = <r(.F n , iiGN) 

Let 
then 

X n = E[Y\F n \ 

is a martingale. □ 
Example 3.11 Consider i.i.d (independent identically distributed) r.v's. 

w n = \\ w - p 'I ; 

I -1, w.p. {. 

and 

n 

x n = Y,w k 

= X n _x + W n 

X is a martingale w.r.t. T n = cr(W , W 1: . . . , W n ) since 

ElXn+ilFn] = E[X n + Wn+ll-Fn] 

= X n + PfW^+il^] (since X n is jF n -msble) 
= X n + E[W n+ i] (by independence) 
= X n (martingale) 

□ 
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3.1.5 Semimartingales 

Definition 3.12 A stochastic process A = {A n } is said to be predictable with respect 
to the filtration {J-'n} if Ai i s T n -\-measurable. 



Example 3.13 Suppose J- n = c(Z , Z\, . . . , Z n ) for some stochastic process {Z n }. If A 
is predictable w.r.t. {J-'n}, then A n is a function of Zq, Zi, . . . , Z n -i- i.e. the value of A n 
is determined by the values of Z ,Z±, . . . , Z n -\. □ 



Definition 3.14 Let X be a stochastic process adapted to {J-'n}- If 

X = A + M 

where A is {J-'n} -predictable and M is {J-'n} -martingale, then X is called a semimartin- 
gale. 

Remark 3.15 Semimartingales are important in engineering and physics : 

X = "signal" or "trend" + "noise" 
^ A ' ^~M~" 

A is sometimes called the "compensator" of X (relative to {J- n })- 

Theorem 3.16 (Doob Decomposition). If X is an integrable adapted process, then 
X is a semimartingale. 

Proof. Define 

n 

A n = ^^E[Xk — Xk-llJ-'k-l] 
fc=l 

M n = X n - A n 

By definition, A is {jF n }-predictable. Next, 

n 

ElMn^n-l] = -E[X n |jF n _i] — ^ E[E[Xk — Xk-^Fk-lWFn-l] 

fc=l 

n-1 

= E\X n \T n-i] — E[E[X n — X n _x\T n -^[\T — ^ E[X k — X fc „ 1 |jF fe _ 1 ] 

fc=i 

= X n -i — A n -\ 

M is a {jF n }-martingale. □ 
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Remark 3.17 Write AX n = X n - X n _i then 



AX n = AA n + AM n 
» ' — 



prediction "innovations" or new information 

AA n = E[AX n \F n _ 1 ] 
E[AM n \T n ^\ = 

where AM n is martingale increment. This is called the innovations representation. 
Example 3.18 Let 

n 

x n = Y,w k 

k=0 

where {Wk} ~ N(0, 1), i.i.d. By the innovation representation, AX n = W n , 

AA n = ^[AX n |^ n _i] = so that AX n = AM n = W n □ 

3.1.6 Markov Processes 

Definition 3.19 Let X be a stochastic process defined on (Q,J-,P) and taking values in 
a measurable space (S,S) and adapted to {J-'n}- X is called a Markov process if for 

m>n 

P{X rn e A\F n ) = P{X m e A\X n ) , VAeS 

That is, conditional probabilities of the future behavior of X given the whole past depend 
only on the current value. 

Example 3.20 Let S — R and S = B(R). W is i.i.d. process. 

X n+l = b(X n ,W n ) 

where b : R 2 — > R. X is a Markov process w.r.t. T n = cr(W , Wi, . . . , W n -i). 

E[<j>(X n+1 )\F n ] = E[<j>(b(X n ,W n ))\F n ] 

(f>(b(X n ,w))P Wn (dw) 



= E[<P(X n+1 )\X n ] 

Setting 4> = I a we see that X is Markov. □ 

Definition 3.21 The (one step) transition probability of a Markov process X are 
given by the transition kernel 

Px n+1 \xM\x) = P(X n+ i e A\X n = x) 
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Example 3.22 In the last previous example, 

/oo 
I A (b{x,w))P Wn (dw) 
-oo 



□ 



Proposition 3.23 ( Chapman- K olmog orov ) . 

Px n+ M) = J Px n+1 \xM\x)Px n (dx) 

E[ct>{X n+1 )} = J <f>(x')P Xn+1 (dx') 

= I <P{x>) [ P Xn+llXn (dx'\x)P Xn (dx) 
Js Js 



Proof. 



P Xn+1 {A) =E[I A {X n+1 )] 

= E[E[I A {X n+1 )\F n \\ 
= E[E[I A (X n+1 )\X n ]] 
E[P Xn+llXn (A\X n )] 

P Xn+1 \xM\x)Px n {dx) 



□ 

Example 3.24 (Markov Chain). Now suppose S = {s±, s 2 , . . . , Sn}. Let P = (Pij) be 
an N x N matrix such that 

N 

^2 p ij = 1 » Oij>Q 

3=1 

Define the transition kernel 

P Xn+1 \X n {Sj\Si) = Pij 

and the probabilities 

Pn(i)=PxM) = P(X n = Si ) 

Then, the Chapman-Kolmogorov equation reads 

N 



Pn+l(j) = ^PijPnii) 



i=l 

i.e. 



Pn+l = P*Pn 
Pn = (P*) n P0 
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where po(si) = P(X = Sj) = p(i) is the initial distribution. Thus the probability 
distribution of X n satisfies the dynamics 

f Pn+l = P*Pn 
\P0 = P 

Here we regard p n as a column vector. If, instead, we regard p n as a row vector, then 

Pn+l = PnP 

□ 

Remark 3.25 In general, suppose Px n+1 \x n is independent of time n. We write 

p{A\x) = P Xn+llXn (A\x) 
p(A) =PxM) 
p n {A) =PxM) 

then 

[p n +i= P*Pn 
\Po = P, 

where 

(P*p)(A) = / p{A\x)p(dx) 
Js 

3.1.7 Observation Processes 

Definition 3.26 Let X be a Markov process with values in S. An observation process 
is an O valued process defined by the observation probabilities Py\x 

P(Y n e B) = I P Ylx (B\x)P Xn (dx) , B e B(R) 
Js 

Note that for each x G S, P Y \x(-\x) is a probability measure on (0,B(0)). 
Example 3.27 Let {V n } be i.i.d. independent of X. Define 

Y n = h(X n , V n ) 

then 

Py\x(B\X n ) = P(Y n e B\X n ) 

= P(h(X n , V n ) e B\X n ) 

= J I B (h(X n ,V n ))P Vn (dv) 

□ 
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3.1.8 Linear Representation of a Markov Chain 

Suppose we replace S by S' = {e 1 ,e 2 , ■ ■ ■ ,e^} where is the i-th unit vector in H N . 
Then X n = e$ for some i — 1, 2, . . . , N 

Lemma 3.28 Let X n be a Markov chain with probability matrix P. Then 

AM n+1 = X n+1 - P*X n 

is a martingale increment. 

Corollary 3.29 X has the semimartingale or innovations representation 

X n+l = P*X n + AM n+1 

or 

AX n+1 = (P* - I)X n + AM n+1 
3.2 Controlled State Space Models 

We will define controlled stochastic systems in terms of controlled transition probabilities 
and output probabilities. 

Definition 3.30 A stochastic system (S,U,0, P,Q, p) is specified by 

• Borel spaces S, U, O 

(state, control or input, observation or output) 

• Transition probability P 

P(-\x,u) is a probability measure on S , V(x,u) G S x U . 

• Output probability Q 

Q(-\x) is a probability measure on O , \/x G S 

• Initial distribution p of X . 

The evolution of the system is described as follows. If X n = x G S is the state at time 
n and if U n — u G U is the control input applied at the time n, then the system moves to 
a new state X n+1 according to the probability measure P(-\x,u) and produces an output 
Y n according to Q(-\x). 
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Example 3.31 Nonlinear Systems. 

Let S = R™, U = R m and O = R p . The processes {W n } and {V n } are i.i.d., independent, 
and taking values in R r and R s respectively. X ~ p is independent of and V. Define 

b : R" x R m x R r -> R n 

and 

/i : R n x R s -» R p 

such that 

X n+ i = 6(X„, [/„, W„) 
F n =/i(X n ,K) 
The transition and output probability measures are 



P(A\x,u) = J I A (b(x,u,w))P Wn (dw) 

v/e5 = B(R n ) , (x, w) G R™ x U 



Q(B\x) = J I B (h(x,v))P Vn (dv) 

v/e0 = B(W) , x e S 



□ 



Example 3.32 Linear Systems. 

This is a special case of nonlinear systems with b, h are linear and W, V and p are Gaussian 
: W n ~ 7V(0, Q), K ~ N(0, R) and p ~ N(x , S ) : 

X n+1 = AX n + £[/ n + GW n 
Y n = CX n + HV n 

□ 

Example 3.33 Controlled Markov Chain (Finite State). 

Consider S = {s 1 , s 2 , . . . , s^}, U = {ui, u 2 , ■ ■ ■ , %} and O = {o±, o 2 , . . . , op}. w) 
is defined by a controlled transition matrix P(u) where 

N 

^P i3 {u) = l , P^u) >0,VueU, Si eS 

3=1 

P(Sj\Si,u) = Pij(u) 

Q(-\x) is defined by an output probability matrix Q : 

P(oj\si) = Qij 
p 

^ Qij = 1 , V Si G 5 , Qij > 

□ 
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3.2.1 Feedback Control Laws or Policies 

A deterministic sequence u = {uq, U\, u 2 , ■ ■ ■} of control values is called an open loop 
control. A feedback or closed loop control depends on the observations. More pre- 
cisely, let g = {g , g±, g 2 , ■ ■ ■} be a sequence of functions 

g n : O n+1 - U 

The value u n of the control at time n is given by 

u n = g n (yon) 

where y 0n = {y , y 1: . . . , y n }. Such a sequence g is called a feedback or closed loop 
control policy or law. Note that an open loop control is a special case of a closed loop 
policy. 

A policy g determines a probability measure P g on the canonical sample or path space 

a = (S xU x Y)°° 

t = B(n) 

Note uj £ f2 is of the form 

uj = {x ,M ,l/o,a:i,Mi,l/i- • • •} 

Let 

= cr(X , Y Q , Xi, Yi, . . . , X n , Y n ) 
y n =a(Y ,Y 1 ,...,Y n ) 

Then U n = g n (Xon) is adapted to y n and X n ,Y n is adapted to T n . Note that now 
y n = cr(Y , Yi, . . . , Y n , U , Ui, . . . , U n ) since g is used. 



3.2.2 Partial and Full State Information 

In general, the only information available to the controller is contained in the output or 
observation process Y; X is in general unavailable. This is the case of partial informa- 
tion. If O — S and Q(-\x) = S x (Dirac measure), then Y = X, and one has full state 
information. 

Control problems are much easier to solve when one has full state information. In the 
general case of partial information, one must do some filtering as well. 



3.3 Filtering 
3.3.1 Introduction 

Consider the (non-dynamic) filtering problem 

X = ix + W (signal) 



Y =CX + V (observation) ^ 107 ^ 
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where W ~ N(0,Q), V ~ N(0,R) are independent. Let's compute X = E[X\Y] using 
two methods: 

I. The "innovations" method. We will see that X has the representation 

X = fi + aY 

where Y = Y — Cfi is the "innovation". 

Note: We use the following notation: 
Means : 

X = E[X\ = fi 
Y = E[Y\ = Cfi 

Covariances : 

S x = Cov(X) = E[(X - n)(X - n) 1 ] 

= Q 

S Y = CoviY) = E[(Y -Cfi)(Y -Cn)'\ 
= CQC + R 
S XY = Cov(X, Y) = E[(X -fi)(Y- Cfx)'} 
= QC 

We claim 

x = /i + s X yS y 1 (r-c / i). 

Proof. Write 

V = / U + S X yS y 1 (F-C/i) 

v =X-v 

then 

E[u\ = 

E[u(Y - ¥)'} = 0. 

Then since these random variables are Gaussian, v and Y are independent (orthog- 
onal). Then, 

X =E[X -d+D\Y] 
= E[u\Y] + E[v\Y\ 
= + u 

This gives the "Kalman Filter" for the problem : 

X = fj, + QC'(CQC + R)-\Y - Cfi). 

□ 

Note that if we write error estimates by e = X — X, the conditional error covariance 
is given by 

S e = Sx — SxyS y 1 S / xy 

= Q- QC'{CQC + R)- l CQ 
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Remark 3.34 X — /j, is the projection of X = X — /i onto the subspace 

y = {aY : a G R} 
relative to the inner product £J[X7] = Cov(X, Y), see Figure 4. 




y 



Figure 4: Projection interpretation, 
i.e. X must satisfy X — (X — /j) _L Y 

i.e. (X - (X - fi) , y) = 0. Since X - (i = aY, we get (X, ?) = a(y, y). Thus, 

a=(X,Y)(Y,Y)- 1 



2. Using the "reference probability" approach. 

We consider the following processes are under a probability measure P, 



X = n + W 
Y =CX + V 



The joint density of X, Y is 



Px,r(x,y) = px(x)pY\x(y\x) 

in which 

px(ar) = (2tt|Q|)~2 exp(-|(a: - ti)'Q~\x - fi)) 
PY\x(y\x) = (2tt\R\)- 1 2 exp(-|(y - Cx)'R-\y - Cx)) 
Using a new "reference" probability measure P + , we have the new processes 



X = fi + W 
Y =V + 
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where V + ~ JV(0, i?), W ~ iV(0, Q), V + and W are independent. The joint density 
of X, Y is now 

Px,y( x >y) =Px(x)p v (y) 

where 

Pv {y) = {2'K\R\y 1 2eM-\y'R- 1 y) 

dP 

Write — — — = A, where 
dP + 

A(x,y) = exp(-|| J R- 1 Ca;| 2 + y'R^Cx) 

By Bayes' rule, 1 

E[X]Y] ~ E+[A\Y\ 
the conditional density is given by 

A(x,y)p x (x) 



PX\Yi 



x\ 



A(x,y)p x (x) dx 
Qx\v(x\y) 



R 



/ qx\r(x\y) dx 

We call qx\y the "unnormalised conditional density 

qx\v(x\y) = A(x,y)p x (x) 

Now we compute, 

X = E[X\Y = y] = [ xpx\ Y (x\y)dx 

xqx\v{x\y) dx 



I Qx\v(x\y) dx 
= fi + QC'(CQC + R)-\Y - Cfi). 



Remark : The main techniques for nonlinear filtering are the innovations approach and 
the reference probability method. 



1 Prove this using the invariance property (101) by showing that £7pY|Y].E + [A|Y] = £+[XA|Y]. 
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3.3.2 The Kalman Filter 

Consider the linear gaussian stochastic system : 

/ X k+1 = AX k + GW k 
\ Y k = CX k + HV k 

where the initial condition X ~ N(X ,T, ), W k ~ iV(0,<2)and V k ~ N(0,R) are all 
mutually independent with Q > 0, R > and S > 0. 

Filtering problem : to compute conditional expectations of the type 

fa = E[</>(x k )\Y , k ] 

Since all the random variables are Gaussian, the conditional distribution 

P Xh \ Yo ,M\Yo*) = P(*k e A\Y 0tk ) 

is Gaussian with density 

Px k \Y 0:k (x\Y 0:k ) = p k \ k {x\Y Q .. k ) 

= (27r|S fc | fc |) _ 2 eX p(-|(x - X k \ k )' (x - X k \ k )) 

where 

X k \k = -E[^fc|^o,fc] 

Sfc|fc = E[(X - X k \ k ){X — X k \ k )'\Y ^ k ] 
Note that the conditional distribution is completely determined by the finite dimensional 

2 

quantities X k \ k G R™ and Y, k \ k G R n . 

The filtering problem will be solved if we can find simple recursion for these two 
quantities. Thus 

4>k = / (f)(x)p k \ k (x\Y , k ) dx 

Theorem 3.35 (Kalman Filter, [38, chapter 7], [1, chapters]). The conditional den- 
sity p k \k ~ N(X k \ k , E fc | fc ) is obtained from 

X k+ \\k+i = AX k \ k + Lk+i(Y k+ i — CAX k \ k ) 

Xq\o = LqYq + Xo 
Sfe+i|fc+i = {I — L k+ iC)H k+1 \ k 
= AY^k^A' + GQG' 
^o|o — (I ~ LqC)T, 
Lk = ^k\k~iC (CTj k \ k _iC + HRH'y 1 
L = E C'(CE C + HRH'y 1 

Notation. p k +i\ k (x\Y 0jk ) denotes the conditional density of X k+X given Y 0;k . 
X k +i\k — E[X k+ i\Y 0tk \ 

Sfe+i|fc = E[(X k+1 — X k+ i\ k ){X k+ i — X k+1 \ k )'\Yo^] 
Pk+\\k ~ N(X k+1 \ k , E fc+1 | fc ) 
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3.3.3 The Kalman Filter for Controlled Linear Systems 

Let g = {g , gi, . . . , } be a feedback policy, possibly nonlinear. This determines : 

X a k+l = AX« + BU» + GW k 
Yl =CX a k +HV k 

The processes X 9 , Y 9 need not be Gaussian. However, the conditional densities p k +i\ k , Pk\k 
are Gaussian. 

Theorem 3.36 The conditional means X^ k and X k+1 ^ k are given by 

X 9 k+ i\ k+ i = AX 9 k]k + BUI + L k+1 (Y° +1 - C{AX 9 k+1 + BU'( +l )) 
XLi\* = ± X m-i + BUI + AL k (Y* - CX^_ X ) 
-^o|o = X o + -^0(^0 — CXq) 

Sfc|fc, and L k are given as before. 

Proof. Write 

Xl = X 9 k + X k , Y k 9 = Y 9 k + Y k 

where _ _ 

X 9 k+l = AX\ + BUI , X 9 = X 

Yl =CX 9 k 
X k+l =AX k + GW k , X ~iV(0,S ) 
Y k = CX k + GV k 

Then the conditional density of X 9 k given (Y^ k , U$ fe _ 1 ) is the conditional density of X k 
given F ,fc shifted by X 9 k . (See [38, page 102-3] for details.) □ 

3.3.4 The HMM Filter (Markov Chain) 

Let X, Y be a HMM defined by a transition matrix P and an output probability matrix 
Q: 

P(X k+1 = Sj\X k = = p i:j 
P(Y k = Oj\X k = s^ = qij 

This determines a measure P on a measurable space (fi,^ 7 ) (canonical path space, say). 
Filtering : We wish to compute the conditional probabilities 

P{X k = Sj \Y , k ). 

If we use the linear representation of Markov chain, we want to compute the (column) 
vector 

Pk\k = E[X k \Y 0jk ] 
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Recall that in the linear representation, X k = P*X k -i + AM^ where AM k is a martingale 
increment w.r.t. T k = cr(X 0yk ,Y 0yk ). 

Since Y k depends on AM k , it is not clear how to proceed. We will use a reference 
probability P + under which Y k is independent of AM k . 

Under P, the joint density of X 0>k , F ,fc is 

PXo,k,Y ,k( x O,k,yo,k) = Px ,k( x O,k)PY ,k\X , k (yo,k\ X 0,k) 

where 



fc-1 

PX 0ik ( x 0,k) =Y\.Pxi,x l+1 Px 

1=0 

k 

PY , k \X , k {yo,k\ X 0,k) =Y[Q 



'Xi,yi 
1=0 

p X() is initial distribution of X. 

The new measure P + corresponds to the joint density 

Px , k ,Yo,S X ^yo,k) = PX , k ( x 0,k)PY 0ik \X 0ik (y0,k\ X 0,k) 

where 

PYo, k \x 0:k (yo,k\ x o,k) = pki 
where p = j^O is the number of output values. Then, 

dP 



dP+ 



Tk 
k 



,Vl 



1=0 

Notation . We write ^ k for the diagonal matrix with entries pQi, yk , i — 1, . . . ,n. And we 
write 

Ok = E + [X k A k \Y 0)k ] 

(an n-vector since X k is an n-vector using the linear representation 

Note: Under P + , X is a Markov chain with transition matrix P and initial distribution 

p, Y is i.i.d. uniform and, X and Y are independent. 
Bayes' rule : 

o~ k 

Pk\k — 7 TV 

where (a k , 1) = YTi=i a k(i) = E + [k k \Y^ k }. (., .) = inner product in R™. 
Theorem 3.37 (EMM Filter) 

Pk\k = ^ k P*Pk-i\k-i 
Po\o = P 

C k = (* fc P*p fc _i| fc _i,l) 

c =1 
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Remark 3.38 In unnormalized form we have 



<7 fe = * fc P*(T fc _i 

a = p 



which is a linear recursion. 

Proof. 

• Step 1 

Recursion for a k . 



a k (j) = E+[X k (j)A k \Y , k ] 

= J B+[(P*X fe _ 1 + AM k ) jP Q^ k A k ^\Y 0}k ] 

= ^(j)Er=i^-^ + [^-i(0 A fc _i|y 0)fc -i] 

(by independence). 

= *kO')zr=iP« ^-iW 



Step 2 

By induction, 



To see this, 



This gives 
and clearly 



k 
1=0 



Ok 

Pk\k 



(<*k, 1) 

(ffe, 1) 

^fcP*Pfc-l|jfe-l(o-fc-l,l) 



Cfc = 



(o-fc-i, 1) 



0/ <<7fc_i,i> K-2,i> • • ' (<T0,1> 

1=0 

Note that the (nonlinear) recursion for p k \ k is just formula (6.2) in [38, pag 
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3.3.5 Filter for Controlled HMM 

Consider a controlled Markov chain with transition matrix P(u) and output probability 
matrix Q. Let g be a feedback policy 

Uk= 9k(Yo,k) 
We want to compute the conditional expectation 

P{X k = Sj\Y 0!k ,U 0j k-i) 

i.e. 

Pk\k = E[Xk\Yo,k, U 0! k-i] 
Theorem 3.39 ([38, page 81]). p g k , k does not depend on g. It satisfies 

pl\ k = i**^(«*-i)Pfc-i| fc _i 

Po|o = P 

C k = (*fcP*Pfc-i|fc-i, 1) 

c =1 
as before. 

Definition 3.40 A stochastic process £ = is ca//ed an information state if 

• £ fc zs a function ofY 0jk , C/ ,fc-i 

• £fc + i can 6e determined from (, k , Uk+i(ind u k . 

Example 3.41 = p^ fc is an information state. a k \ k = ^pf^ is also an information 
state. For linear systems, X k \ k is an information state □ 

3.4 Dynamic Programming - Case I : Complete State Informa- 
tion 

The stochastic system is described by (S,U,P,p). A control policy g = {g , gi, . . .} 
determines control U and state X processes : 

Uk = gk(Xo,k) 

P(X k+1 e A\X , k , U , k ) = P(A\X k , U k ) 

Let 

G = {g '■ g is a state feedback policy} 
denote the set of admissible controllers. 
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Cost function : 

M-l 

J(g) = E°[J2L(Xi,Ui) + HX M )} 
1=0 

where 

L : S x U 

$ : R 

L > , $ > 



3.4.1 Optimal Control Problem 

Find g* £ G minimizing J; i.e. 

J(<?*) < J(g) ,VgeG 

We solve this problem using dynamic programming. 

To this end, we define the value function 



M-l 

V k (x) = inf El tk [Y,L{X u U l ) + ^{X M) 

seG fc , M -i 

for jfe = 0, 1,...,M- 1 

V M {x) = 

where G fc ,« denotes policies # M = g k+1 , ...,gi} 

V k (x) is the optimal "cost to go", given that we start at x at time k; i.e. X k = x. 

Theorem 3.42 ([38, chapter 6]) V satisfies the dynamic programming equation : 

V k (x) = inf{L(x,w) + / V k+1 (z)P(dz\x,u)} 

ueU J g 

for k = 0, 1,...,M - 1 
V M (x) = $(x) 



(DPE) 



Further, let u k (x) denotes a control value which achieves the minimum in (DPE), for 
k — 0, 1, . . . , M — 1. Then the policy g* defined by 

u k = 9* k ( x o,k) = u* k ( x k) 
is optimal, and J(g*) = J s V (z)p(dz). 

Remark 3.43 Policies g for which g k is only a function of X k (and not X ^-i) are called 
Markov policies. Then w.r.t. P 9 , when g is a Markov policy, X is a Markov process. 

So the optimal policy g* above is Markov and the optimal state process X k is Markov : 

P 9 \X k+l e A\X , k ,U , k ) = P(A\x,g* k (x)) 
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Remark 3.44 Algorithm for finding the optimal state feedback controller : 

1. Set k = M , V M (x) = $(x) 

2. Set k = M - 1. Then solve (DPE) to get V M -i(x) and u^^x) 

3. Set k = M - 2. Then solve (DPE) to get V M - 2 (x) and u* M _ 2 (x) 

4. Continue, until k — 



5. Set 



(X ,i) = tiKXi) 
= ul(X k ) 
<7m-1V^0,M-i) = u* M _ 1 (x M _ 1 ) 



Proof. Define W by 



W k {x) = inf {L(x,u) + / W fc+ i(z)P(dz|a;,u)} 

ueU J g 

for Jfe = 0,l,...,M-l 
Our goal is to prove Wfc(x) = V k (x), and optimal. Define 

Af-l 



^(x, = ^ >Jfc [x; ^) + $(^m)] 



Claim (*): 



W k (x) = inf V fc (a:,^) = V k (x) 

g^Gk,M-i 



= V k (x,g*) 

Assume (*) true for k + 1, . . . , M (trivial for k = M). Then for any g e G k>M -i 

V k (x,g) = E 9 Xtk [L(x,g k ) + Y.f =k l +1 L{X h U{) + $(X M )] 
= E9 x k[ L ( x i9k) + V k+1 (X k+1 ,g k+hM _ 1 )\ 
>El k [L(x,g k ) + W k+1 (X k+1 )} 
> W k (x) 

by induction, with equality if g — g* 
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Example 3.45 Controlled Markov chain with transition matrix P(u). 
The dynamic programming equation is : 



Vfc(si) = mm{L(si,u) + ^ V k+1 (s j) P^u)} 



J'=l 



for fc = 0, 1, . . . , M - 1 

k V M {si) = ®(si) 



If we regard V k as a column vector in R n , etc, this is just 

V k = mm{L u + P(u)V k+1 } 

for k = 0, 1,...,M- 1 
Vm = $ 

This is a nonlinear backward recursion. The optimal Markov policy g* is 

g* k (X , k ) = u* k (X k ) 

where ul(si) achieves min in (DPE) and u* k can be viewed as a vector. □ 

Example 3.46 Nonlinear Systems 

X k+l =b(X k ,U k ,W k ) 
where W = {W k } i.i.d., independent of X ~ p. The DPE 

V k (x) = inf w) + / V k+1 (b(x,u,w))P w (dw)} 
ueu J Rn 

for fc = 0,1,..., M- 1 
Vm(x) = $(x) 

V k is a function of x G R™. is also a function of x G R n . □ 

Example 3.47 Linear Systems (LQG) 

X k+1 = AX k + BU k + GW k 

Then the DPE 

\4(x) = inf {L(x,u)+ / V fc+ i(Aa; + J BM + G'«;)(27r|Q|) _ ^exp(-^ / Q~ 1 w)dw} 
for k = 0,1,..., M- 1 
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Quadratic cost : 



L(x,u) = \x'Yx + \v! 'Au 



2" 

where r>0,P>0,A>0. To simplify, assume 

G=I , Q=I 

X k+1 = AX k + BU k + W k 



{W k } i.i.d. ~ JV(0,J) 
Theorem 3.48 fLQGj 



V k (x) = \x'P k x + a fc 

K( X ) = L k X 



' P k = T + A'P fc+1 A - A'P k+1 B[A + B'P k+1 B\- l B'P k+1 A 
for A; = 0,1,..., M- 1 

(R) ^ Pm = i 5 

L fe =-[A + J B'P fc+1 P]- 1 J B'P fc+1 A 

=|E£ fc+ i^ , fc = 0,l,...,M-l , a M = 

Note that : 

1. The Riccati equation (R) is a nonlinear backward matrix recursion. 

2. The optimal Markov policy is 

u k — 9 k (X 0jk ) = L k X k 

which is linear. 

Proof. By induction. It it true for k = M, (trivial). Assume true for k + 1, ... , M. 
Plug 14+1(2) = \z'P k +\z + a k+ i into (DPE) and evaluate the Gaussian integral (do this). 
Then minimize over u. This gives u* k . Plug back in to evaluate V k {x). This gives the 
Riccati equation (do this!). □ 

With complete state information, the optimal controller is state feedback, Figure 5. 

3.5 Dynamic Programming - Case II : Partial State Information 

Consider general stochastic system 

(S,U,0,P,Q,p) 
Control policies g E G are output feedback: 

Uk = 9k(Yo,k) 



46 



w 



X 



X 
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g*(x) 



Figure 5: State feedback. 



Cost function : 

M-l 

J(g) = E°[Y,L(Xi,Ui) + *(X M )] 
1=0 

as before, except g is a function of Y only. 

Partially observed optimal control : 
Find an output feedback controller g* G G such that 

J{g*) < J(g) ,V 5 6G 

Rather that treat the general case, we first do HMM's, and then linear systems. 



3.5.1 Optimal Control of HMM's 

Consider controlled HMM's with transition matrix P(u) and output probability matrix 
Q. To solve this problem, we could use any suitable information state. Here we will 
use 

TTfe = Pk\k 



Recall that 

(IS) 



7T fc = T(7T fe _i, £/fc-l, Y k ) 
7T = p 



where 



*(y)P*(u)n 

T(ir,u,y) ~ 



(*(y)P*(u)Tr,l)' 

We will show that optimal output feedback control of the HMM is equivalent to opti- 
mal control of (IS), a problem with full state information (ir k is the new state, which is 
observed!). 
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To see how this possible, write 

J{g) =E'\£%z 1 L(x i ,u i ) + *(x M )\ 

= ^ESo 1 E a [L(X t , Ui)\Y 0ii \ + E°[$(X M )\Y 0tM ]] 

= ^E£o 1 (^^) + (^M,<f)] 

where 

L u = L(.,u) 

(t,/> =5Xit(0/(0 

This J(<?) has been expressed purely in terms of the information state n k . 
Theorem 3.49 ([38, chapter 6]) Define 

f M-l 

V k (n) = inf El tk [J2^i,L ul ) + (n M ^)} 

9fc<jfc,M-i l^k 

for k = 0, 1, . . . , M - 1 
[V^tt) =(tt,$) 



T/ien \4 satisfies the dynamic programming equation 
(DPE) 



Vfe(7r) = min{(vr, L u ) + ^ ^ + i(T(tt, u, y))p(y|7r, u)} 



for Jfe = 0, 1,...,M- 1 

Vm(vt) = (7T,$) 

n 

p(y\ir,u) = Qj, y Pij{u)Ti(i). 

Further, ifu* k (n) achieves the minimum in (DPE), then the policy 

Uk = gl(Y ,k) = KM 

is optimal. 

Proof. See [38, page 85]. □ 
Note: The optimal controller depends on the observations Y 0:k through the informa- 
tion state 7Tfc. Such controllers are called separated, i.e. separated into a filter plus a 
controller, Figure 6. 

3.5.2 Optimal Control of Linear Systems (LQG) 

Consider a linear system 

X k+1 = AX k + BU k + GW k 
Y k = CX k + HV k 
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Figure 6: Separation structure for HMM controller. 



with quadratic cost 

L(x,u) = \x'Yx + \u'Ku 
= \x'Px 

where r>0,P>0,A>0. Assume for simplicity, G = I , H = I , Q = I , R = I. 
The conditional density 

TTfc = Pk\k ~ N(X k \ k , T, k \ k ) 

is an information state. Since T* k \ k is deterministic, X k \ k is itself an information state for 
the linear system. Thus we expect the optimal policy g* to have the form 

Uk = g*(Yo,k) = U* k ( X k\k) 

for a suitable function u* k (x). It turns out that u* k {x) is the complete state information 
controller derived earlier. 

Theorem 3.50 Let X k \ k be the conditional mean as determined by the Kalman filter. Let 
L k be the gain sequence determined by the state feedback linear quadratic problem. Then 

u k = 9*(Xo,k) = L k X k \ k 
is the optimal policy for the partially observed LQG problem. 

Note that this optimal controller is separated. 

Proof. See [38, page 105]. □ 
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Figure 7: LQG separation structure. 

3.6 Two Continuous Time Problems 
3.6.1 System and Kalman Filter 

Consider the (classical) system model 

dx c = (Ax c + Bu) dt + Gdw (108) 
dy = Cx c dt+ dv (109) 

where 

1. x c (t) is the state of the system (a vector), given by a linear stochastic differential 
equation driven by Gaussian white noise u>(£), 

2. y(t) represents the measured output variables, which are subject to additive Gaus- 
sian white noise v(t), 

3. u(t) is the control variable (a vector), which we take to be a function of the mea- 
surement data y(s), < s < t, which we write u(t) = K(y(s), < s < t), 

4. A, B, G and C are appropriately sized matrices, and 

5. w(t) and v{t) have zero mean, variance 1, and correlation a. 

Throughout we interpret stochastic differential equations in the Ito sense, so that, e.g. 
dw(t)dw(t) = dt, dv(t)dv(t) = dt, and dw(t)dv(t) = adt. 
The continuous time Kalman filter equations are 

dx c = (Ax c + Bu)dt + (Y C C T + Ga T )(dy - Cx c dt) (110) 
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and 



Y c = AY C + Y C A T + GG T - (Y C C T + Ga T )(CY c + aG T ) (111) 

with initial conditions x c (0) = x c0 , Y c (0) = Y (the initial Gaussian parameters), see [49, 
section 4.5], [16, Theorem 4.4.1 (when a — 0)]. 

3.6.2 LQG Control 

In continuous time, the classical linear quadratic Gaussian (CLQG) problem is to minimize 
the cost function 

J(K) = E[ f \\x T c {s)Px c {s) + \u T {s)Qu{s)} ds + \x T c {T)Sx c {T)\. (112) 
Jo 

for the linear gaussian model (108), (109). 

The optimal control is given in terms of the Kalman filter 

u*(t) = -Q^B T X c (t)x c (t), (113) 

where 

X c + A T X C + X C A - X C BQ^B T X C + P = 0, (114) 
and X C (T) = S, [49, Part I], [16, Theorem 5.3.3 (when a = 0)]. 

3.6.3 LEQG Control 

Another important class of stochastic control problems are the so-called risk- sensitive 
problems. The risk-sensitive cost functions attempt to give larger penalties for large 
deviations from desired values and tend to lead to designs which are more robust than 
LQG controllers. In this subsection we switch to continuous time. 

The classical linear exponential quadratic Gaussian (CLEQG) problem (a type of risk- 
sensitive problem) is specified by modifying the LQG cost using the exponential function 
as follows: 

J"(K) = E[exp/i( I [\x T c {s)Px c {s) + \u T {s)Qu{s)} ds + \x T c {T)Sx c {T))}. (115) 
Jo 

Here, fi is a parameter indicating the sensitivity to risk. If fi > 0, the problem is risk- 
sensitive, and consequently gives greater weight to the size of the integral compared to the 
LQG case. If ji < is referred to as risk-seeking. The risk-neutral case /i — > corresponds 
to the LQG problem above, 

-logJ^(K)^ J(K), (116) 

see [8], [37]. 
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The CLEQG criterion is potentially infinite if \i > is too large. However, when finite 
the problem is solvable and the optimal solution is 

u *(t) = -Q- 1 B T X^(t)[I - ^(ijlfWr^fft), (117) 

where 

dx» = {{A + fjYfP]x% + Bu)dt + (Y?C T + Ga T ){dy - Cx£dt), (118) 

Yf = AY? + Y?A T + GG T + nYfFY? - (Y?C T + Ga T )(CY? + aG T ), (119) 

with initial conditions x%(0) = x c o, Yf(0) = Y , and 

Xg + A T X£ + X£A - X^[BQ- 1 B T - fiGG T ]X? + P = 0, (120) 

with terminal condition X£(T) = S, see [8, Theorem 4.1 (when a = 0)], [49, section 8.5], 
[14, Theorem 3.11]. 

It is important to note that the CLEQG filter (118), (119) is not the LQG Kalman 
filter, but reduces to it when fj, — 0. The CLEQG state x%(t) can still be interpreted as a 
state estimator for x c (t), but it is not the optimal minimum mean square error estimator: 
in general x%(t) ^ x c (t). However, x%(t) does serve as an information state for the CLEQG 
problem, since the cost function (115) can be expressed in terms of it, see [8, Theorem 
3.2]: 

J"(K) = E[J exp(-^x T Sx)n»(x,t)dx] (121) 

where 7r M (a;,t) is a gaussian density with mean x^(i) and covariance Y?{t). In general, it 
does not seem possible to represent the CLEQG cost in terms of the Kalman filter state 
x c (t). 
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4 Robust Control 



References for this section include [30], [54], [19], [26], [4], [46], [31]. 
4.1 Introduction and Background 

As mentioned in the Introduction, section 1, robustness concerns the ability of a con- 
troller to cope with uncertainty, disturbances, and model errors. Typically, a controller 
is designed on the basis of an idealized model, which may neglect some features of the 
real system. The parameters of the model may or may not be well known. Further- 
more, the real system may be subject to disturbances, noise and dissipative effects. A 
robust controller should have good performance under nominal conditions, and adequate 
performance in other than nominal conditions. 

Figure 8 illustrates a common setup for robust control design. 





uncertainty 












physical system 
















controller 









y 



Figure 8: Setup for robust control. 



Note that uncertainty has been explicitly included in the setup. The robust control 
problem is to design a controller achieving the desired performance, explicitly taking into 
account the presence of uncertainty. Robustness can be guaranteed only for the classes of 
uncertainty that have been modeled. 
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Standard approaches to robust control design include the use of H°° methods, and 
integral quadratic constraint (IQC) methods. In H°°, which originated in the frequency 
domain [53] and gave rise to the (unwieldy) name, systems are regarded as possibly 
nonlinear operators acting on signals, and stability and robustness are characterized in 
terms of the boundedness of the operators. The controller is chosen to make the gain 
or norm 7 of the closed loop physical system-controller combination considered as an 
operator from disturbance inputs w to disturbance outputs z "small" . In view of the small 
gain theorem (see, e.g. [47, Theorem 6.1.1 (1)]), the design is capable of accommodating 
uncertainties of gain less than I/7. Detailed knowledge of the uncertainty is not required, 
save for the gain limitation and structural connections to the physical system. IQC 
design is closely related, but here the uncertainty is characterized by an integral quadratic 
constraint, and the controller is designed subject to this constraint. It is important to 
know that not all robust control problems are feasible, for instance a designer may choose 
too small a value of 7, and not be able to find a solution. 

4.2 The Standard Problem of Control 

Now we describe in more detail the standard problem of control. The problem entails 
a description of the plant and controller models, and definitions of the control objectives. 
The standard control problem corresponds to the Figure 9, which we now explain. 





G 




u 






y 










K 





Figure 9: The Closed-Loop System (G, K) 



4.2.1 The Plant (Physical System Being Controlled) 

Consider plants G with the following structure 

A(x) + B^w + B 2 (x)u 



G 



x 



I y 



d(x) + D 12 (x)u 
C 2 (x) + D 21 (x)w 



(122) 



Here, x(t) e R ra denotes the state of the system, and is not in general directly measurable; 
instead an output y{t) e R p is observed. The additional output quantity z(t) G R r is a 
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performance measure, depending on the particular problem at hand. The control input 
is u(t) G R m , while w(t) G R s is regarded as an opposing disturbance input. Detailed 
assumptions concerning the functions appearing in (122) are given in [31]. Here we note 
that the origin is assumed to be an equilibrium, and A(0) = 0, Ci(0) = 0, C 2 (0) = 0. 

4.2.2 The Class of Controllers 

The plant G was described by an explicit state space model and is assumed given. How- 
ever, in the spirit of optimal control theory, we do not prescribe a state space model for 
the controller K, since it is an unknown to be determined from the control objectives. 
Rather, we simply stipulate some basic input-output properties required of any admissible 
controller, namely that the controller must be a causal function of the output 



and the resulting closed loop system be well-defined in the sense that trajectories and 
signals exist and are unique. The controller K will be said to be null-initialized if K(0) = 0, 
regardless of whether or not a state space realization of K is given. 

4.2.3 Control Objectives 

The ifoo control problem is commonly thought of as having two objectives: find a con- 
troller K such that the closed loop system (G, K) is 

1. dissipative, and 



In §4.3 we define what is meant by these terms in the case of linear systems. We now 
describe their meanings for nonlinear systems; this gives an extension of control to 
nonlinear systems. 2 



The closed loop system (G, K) is ^-dissipative if there exist 7 > and a function 
(3(x ) > (3(0) = 0, such that 



This definition is saying that the nonlinear input-output map (G, K) : w 1— > z defined 
by the closed loop system has finite L 2 gain with a bias term due to the initial state Xq 
of the plant G. This inequality expresses the effect of the uncertainty w on the system 
performance z. 

2 The term "nonlinear control" has no precise mathematical meaning, but it has come into common 
use in the control engineering community and refers to nonlinear generalizations of -ffoo control (which 
has precise meaning for linear systems). 



K : !/(•) 




2. stable. 




(123) 
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While dissipation captures the notion of performance of a control system, another 
issue with control is stability of the system. The closed loop system will be called 
weakly internally stable provided that if G is initialized at any x , then if w(-) G L 2 [0, oo), 
all signals u(-), y(-), z(-) in the loops as well as x(-) converge to as t — > oo. By internal 
stability we mean that the closed loop is weakly internally stable and in addition if the 
controller has a state space realization, then the controller state will converge to an 
equilibrium as t — > oo. 

Dissipation and stability are closely related; see, e.g. [50], [32], [33], [46]. Indeed, 
dissipative systems which enjoy a detectability or observability property also enjoy a 
stability property. In our context, suppose the system (G, K) is z- detectable, that is, w(-) 
and z(-) G L 2 [0, oo) imply x(-) G L 2 [0, oo) and x(t) — > as t — > oo. By z-observable we 
mean that if w(-) = 0, (•) = 0, then x(-) = 0. If (G,K) is 7- dissipative and z-detectable, 
then (G,K) is weakly internally stable (see [31, Theorem 2.1.3]). 

Solutions to this problem are descrbed in [46], [4], [31]. 



4.3 The Solution for Linear Systems 

We recall here the well-known solution to the control problem for linear systems, see 
[19], [45], [30], etc. The class of linear systems considered are of the form 

x = Ax + Biw + B 2 u 



G : < z = Cix + D 12 u 



k y = C 2 x + D 21 w. 



(124) 



4.3.1 Problem Formulation 

The class of admissible controllers K are those with finite dimensional linear state space 
realizations 

{r] = A K r] + B 1K y + B 2K u 
u =C K + D K y 

Given 7 > 0, the control problem for G is to find, if possible, a controller K such 
that the resulting closed loop system (G, K) : w 1— > z satisfies: 

(i) Dissipation: The required dissipation property is expressed in the frequency domain 
in terms of the norm of the closed loop transfer function (G, K)(s) as follows: 

II (G,K) || Hoc SUp 0~ max [(G,K)(ju)} < 7. 



(ii) Stability: We require that the closed loop system 

(G, K) is internally stable. 
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4.3.2 Background on Riccati Equations 

Recall a few facts about Riccati equations. An algebraic Riccati equation 



(125) 



with real matrix entries A, R, Q and R, Q selfadjoint, meeting suitable positivity and 
technical conditions (see, e.g., [54, Chapter 13]), has upper and lower solutions E , E r so 
that any other self adjoint solution E lies between them 

The bottom solution is called the stabilizing solution because it has and is characterized 
by the property 

A + RZ a (126) 
is asymptotically stable. Likewise E r is antistabilizing in that 

—(A + RE r ) (127) 

is asymptotically stable. 

4.3.3 Standard Assumptions 

There are a number of "standard assumptions" that are needed for the necessity and 
sufficiency theorems about Hoq control. These can be expressed in various ways and here 
we follow [45]. 

The first two conditions concern the rank of the matrices D\ 2 and D 2 \. 



D' 12 D 12 = E x > 0, 
D 21 D' 2l = E 2 > 0. 



(128) 
(129) 



Next are two important technical conditions which take the form 

A - jul B 2 

Ci D l2 



rank 



n + m for all u> > 0, 



(130) 



and 



rank 



A- jul B x 



Co 



D 



21 



= n + l for all uo > 0. 



T3r 



These two conditions are commonly used in LQG control and filtering, and concern the 
controllability and observability of underlying systems. 
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4.3.4 Problem Solution 

The necessary and sufficient conditions for solvability of the problem under the stan- 
dard assumptions are: 

Condition 1: State feedback Control. There exists X e > solving the control-type Riccati 
equation 

(A - B 2 E^ 1 D' 12 C 1 )'X e + X e (A - B 2 E^ 1 D' 12 C 1 ) 

(132) 

+X e (±B 1 B[ - B 2 E- 1 B' 2 )X e + C[(I - D^E-'D'^C, = 
which is stabilizing, i.e., 

A - B 2 E^ 1 D' 12 C l + (7 _2 J Bi J B; - B 2 E^B' 2 )X e is asymptotically stable. (133) 

Condition 2: State Estimation. There exists Y e > solving the filter- type Riccati 
equation 

(A - B 1 D' 21 E 2 1 C 2 )Y e + Y e (A - B 1 D' 2V E 2 1 C 2 )' 

(134) 

+Y e ( 1 - 2 C[C l - C' 2 E 2 x C 2 )Y e + B X \I- D' 2X E 2 X D 2X \B\ = 
which is stabilizing, i.e., 

A - B 1 D' 21 E 2 1 C 2 + F e (7" 2 C , ;C 1 - C' 2 E 2 l C 2 ) is asymptotically stable. (135) 

Condition 3: Coupling. The matrix X e Y e has spectral radius strictly less than 7. 

Theorem 4.1 ([19], [45], [30]) The control problem for G , meeting certain techni- 
cal conditions is solvable if and only if the above three conditions are satisfied. If these 
conditions are met, one controller, called the central controller, is given by 

x — (A — B 2 E^D' 2X C X + (~f- 2 B 1 B[ - B 2 E^ 1 B[)X e )x 

+ (/ - T 2 YeX e y\B x D' 2X + Y e C' 2 )E 2 \y - [C 2 + D 21 B[X e ]x) 

(136) 

+(/ - T 2 YeX e )-\B 2 + T 2 YeC[D l2 ){u + E^D'^C, + B' 2 X e ]x), 
u = -E^D'^d + B' 2 X e ]x. 

It is important to notice that the filter for the if 00 controller in (136) is not the Kalman 
filter. This solution was first given in [29], [19]. 
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4.4 Risk- Sensitive Stochastic Control and Robustness 



An important connection between the deterministic H°° problem and the stochastic LEQG 
problem was discovered by Glover and Doyle, [29]. This can be seen by comparing the 
solutions to each problem: (136) and (117) with the identifications 7 = l/y/fl, 

B 1 = (GO), B 2 = B, (137) 

Cl =(T)' ^=(g?/>). (138) 

C 2 = C, D 21 = (0I). (139) 

Both problems are equivalent to dynamic games, [30], [4]. Connections with algebraic 
Riccati equations and disturbance attenuation were discovered in [44]. 

Mathematically, what underlies this connection is the interpretation of the risk-sensitive 
problem as an equivalent stochastic dynamic game. In general terms, this corresponds to 
a general convex duality formula (e.g., see [20, Chapter 1.4]): 

logE P [e>] = sup{E Q [/] - H(Q || P)} (140) 
Q 

where P and Q are probability distributions, and where the relative entropy is defined by 
(e.g., see [42, Chapter 11]) 

H(Q || P) = E Q pog^]. 

The duality formula (140) implies 

logE P [e / ]>E Q [/]-iJ(Q||P) (141) 

for any probability distribution Q which is absolutely continuous with respect to P. 
We let / be an integral of quadratic function 

f = fif [x'(t)Px(t) + u'(t)Qu(t)]dt, 



and let P be a probability distribution corresponding to a nominal model used for con- 
troller design. We also set Q to be probability distribution corresponding to the true 
physical system, with deterministic disturbance w shifting the mean of the noise. Then 



T 

2. 



H(Q || P) = / \w(t)\ 2 dt, (142) 
Jo 

and (141) implies the gain-type inequality [21], [11]: 

E true {[ [x'{t)Px{t)+u'(t)Qu(t)}dt] < i{logJ"+ / \w(t)\ 2 dt}; (143) 
Jo t 1 Jo 

cf. (123). 
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This shows that the performance (evaluated as a LQG mean with respect to the the 
true distribution Q) of the LEQG controller (designed assuming the distribution P) when 
applied to a real system is bounded from above by the sum of two terms. The first term 
is a constant determined by the optimal LEQG cost, while the second is a measure of the 
size of the "uncertainty". This second term is zero under nominal conditions (w = 0). 
See [11], [21] for further details. 
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5 Optimal Feedback Control of Quantum Systems 

References for this section of the notes include [5], [35], [36], [42]. 



5.1 Preliminaries 

An operator T on a Hilbert space H is positive if {ip : Tt/j) > for all E H. A 
generalized measurement is defined as follows [42, section 2.2.3]. Let {M n } be a collection 
of operators such that 

Y,MlM n = I. (144) 

n 

The probability of outcome n when the system is in state p is 

Prob(n) = (p,MlM n ), (145) 
and the state of the system after the measurement outcome n is obtained is 

M nP Mj 

p{n) = P^n)- (146) 

A positive operator valued measure (POVM) [42, section 2.2.6] is a collection {E m } of 
positive operators E n such that 

5>» = J- (147) 

n 

A generalized measurement {M n } defines a POVM {E n } via E n = MlM n . A POVM may 
define many generalized measurements, one of which is obtained by taking square roots. 

It is straightforward to see that an orthogonal projective measurement {P n } (a PVM) 
defines a POVM and a generalized measurement. 

Let VJl n denote the set ofnxn complex matrices. 

A linear map T mapping operators on H to operators is called positive map if T(A) is 
positive for all positive operators A on H. T is completely positive (c.p.), i.e. (T ® I n )(A) 
is positive for any positive operator AonH® Wl n , for any n. 

A quantum operation [34, section 3.1.2], [42, section 8.2.4] is a c.p. map T such 
that < tr[r(A)] < tr[A] for all positive operators A on H. A dynamic map or trace- 
preserving operation is a quantum operation that satisfies tr[r(A)] = tr[A], [34, section 
3.1.2]. 

Every c.p. map T can be expressed in operator-sum form 

Y{p) = Y,M nP Ml (148) 

n 

for suitable operators {M n }, [34, Corollary 3.1.1]. If Y is an operation, then J2 n MlM n < 
/, [42, Theorem 8.1], [34, page 74] and if it is a trace-preserving operation then J2 n M\M n = 
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I. Note that if {M n } is a generalized measurement, then the state reduction can be ex- 
pressed as 

"'<") = <w (149) 

where T(n) = M n pM\ is a quantum operation. 

Please note that use of the above terminology varies somewhat in the literature, with 
some differences in the definitions. For instance, some authors may defined all operations 
to be normalized tr[r(A)] = trL4]. 

Quantum operations and c.p. maps T are associated with adjoints 1^ via 

(r>,x) = ( P ,rtx> (150) 

for states p and operators X. This is the basic Schrodinger-Heisenburg duality. 



5.2 The Feedback Control Problem 



We consider the optimal control of a quantum system using standard digital or analog 
electronics, Figure 10. 



input 



u 



quantum system 
(nanoscale physical device) 



output 



feedback controller K 
(digital or analog electronics) 

Figure 10: Feedback control of a quantum system 



The problem is formulated as follows: 

1. We use a simple discrete time model, and we assume all variables are discrete- valued. 
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2. Evolution of the quantum system is interleaved with measurements. 

3. The controller is a causal function of the measurement data, so that 

u k = K k (y u .. .,y k ) 

4. The quantum system dynamics are modelled by a controlled quantum operation 
T(u,y). This can include the effects of measurement and an external environment. 
For fixed u, y it is a completely positive map (preserves positivity) and satisfies the 
normalization condition 

J2(nu,y)uj,i) = (uj,i) = i (151) 

S/GY 

The quantity 

p(y\u,cu) = (T(u,y)cu,I). (152) 

is a conditional probability distribution on the measurement values (depends on 
current input value and state). 

5. The performance objective J{K) can be expressed in terms of the conditional states. 
The objective is to be minimized to obtain the desired optimal controller K* . 



5.3 Conditional Dynamics 
5.3.1 Controlled State Transfer 

The idea is that if the quantum system is in state u k at time k, and at this time the 
control value u k is applied, a measurement outcome y k+ i will be recorded, and the system 
will transfer to a new state 0J k +i- The probability of y k+i is p(y k+ i\u k , uj k ). 

Selective or conditional evolution means that the new state u k +i depends on the value 
of the measurement y k +i, and we write this dependance as follows: 

u k+1 = A r (u k ,y k+1 )uj k , (153) 

where 

ArlMV^. (154) 
p{y\u,uj) 

Equation (153) is a discrete time stochastic master equation (SME), or quantum filtering 
equation. 

Example 5.1 Consider the operator T(u,y) is given by 

F(u, y)u = q(y\a)P a ESuE?*P a (155) 

a,b 
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where u is a state, and the adjoint is given by 

T\u,y)B = J2<l(y\ a X *PaBPaE£ (156) 

a, b 

where B is an observable. Here: 

1. E% are controlled operators that satisfy Ylb ^b = I- They can be used to model 
the influence of an environment on the system via 

Fu^ESuEZ* (157) 

b 

The simplest case is E% = T u for all b, which corresponds simply to closed unitary 
evolution with no environmental influence: 

S u u = T u ^ujT u . 

2. An observable A is measured. It is assumed to have a discrete non- degenerate 
spectrum {a}. The normalized eigenvectors are \a), with projections P a = \a)(a\ 

(P«M = <«MI«»- 

3. The measurement of A is imperfect; instead values y are measured with probabilities 
q(y\a). The kernels satisfy ^2 y q(y\a) = 1 for all a. 

□ 

Example 5.2 (Two-state system.) We now describe a specific instance of Example 5.1, 
viz. a two-state system and measurement device, where it is desired to use feedback 
control to put the system into a given state. 

A particle beam is passed through a Stern-Gerlach device, which results in one beam 
of particles in the up state, and one beam in the down state. It is desired to put all 
particles into the up state. In the absence of measurement noise, the following simple 
feedback scheme [52] achieves this objective: the beam of particles in the up state is 
subsequently left alone, while the beam in the down state is subject to a further device 
which will result in a change of spin direction from down to up. The final outcome of this 
feedback arrangement is that all particles are in the up state. 

We extend this example by accommodating repeated noisy measurements. Physically, 
the noisy measurements might arise from imperfectly separated beams, where a proportion 
of each beam contaminates the other, and/or from interference or noise affecting sensors. 

The pure states of the system are of the form 

|^>=c_ 1 |-l> + c 1 |l> = 
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The states | — 1) and |1) are eigenstates of the observable 

A =(~o°i) (158) 

corresponding to ideal measurement values a = — 1 and a — 1. It is desired to put the 
system into the state 

w -(!).« ww -(!!). 









M-a 






rpu 







Vk+i 



Figure 11: Two-state system example showing the controlled unitary operator T u and the 
noisy measurement device M-a with error probability a. 



In this example we do not consider the effects on an external environment. We define 
a controlled transfer operator T(u,y) as the following physical process, Figure 11. First 
apply a unitary transformation T u , where the control value u — means do nothing, 
while u — 1 means to flip the states (quantum not gate), i.e. 

We then make an imperfect measurement corresponding to the observable A. We model 
this by an ideal device (e.g. Stern-Gerlach) with projection operators 




followed by a memoryless channel with error probability kernels 

g (-l| - 1) = 1 - a 

q{-l\l) =a 

g(l|-l) =a 

g(ljl) =l-a 
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where < a < 1 is the probability of a measurement error (cf. [42, Figure 8.1]). 
The controlled transfer operator is therefore (from (155)) 

T(u, y)uj = q(y\ - \)P^T u uT^P^ + q{y\l)P 1 T u uoT^ P^ 

In this example, the control u can take the values or 1, and output y has values —1 or 
1(U = {0,1}),Y = {-1,1}). 

If we write a general density matrix as 

" = ( J T ) ' ( 159 ) 



then the controlled operators T(u,y) are given explicitly by 



r(0,-l)w 



r(i,-iv 
r(i,iv 



(1 — a)a;ii 
auj 2 2 

r(0, 1)cj = , , 

\ (1 - a)a;22 

(1 - a)cd22 

awn 

aw22 

(1 — 



□ 



5.3.2 Feedback Control 

In the above description of the quantum system (153), we have not described how the 
controls u k are determined by the measurements y k via a feedback controller K. We now 
do this. 

Feedback controllers should be causal, i.e., the current control value u k cannot depend 
on future values of the measurements yk+i,yk+2, ■ ■ ■■ On a time interval < k < M — 1 
this is expressed as follows: 

K = {K ,K 1 ,...,K M _ 1 } 

where 

m = Ko 
mi = Xi(yi) 
m 2 = K 2 (yi,y 2 ) 
etc. 

To simplify notation, we often write sequences m^^m^+i, . . . ,Mfc 2 as Uk lt k 2 - Then we 
can write Uk = K k (y ljk ). A controller K can be restricted to subintervals k < j < M by 
fixing (or omitting) the first arguments in the obvious way. We denote by K the class of 
all such feedback controllers. 
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Pi Ul 


P2 ^2 




1 ..(o-i) 

2 U l 

1 . .(0,1) 

2 ^1 


2a(l-a) 4 U '" 1W1 '" 1J 

a 2 + (1 _ a) 2 W (0,-1),(1,D 

2a(l - a) 4°' 1) ' ( °'" 1) 
a 2 + (l-a) 2 4°' 1) ' (0 ' 1) 


Po = ^0 


Pi 


P2 



Table 1: State evolution under the controller K. 



A feedback controller K in closed loop with the quantum system, Figure 10, operates 
as follows. The given initial state ujq and controller K are sufficient to define random 
sequences of states u; ,m, inputs Mo,m-i and outputs y lt M over a given time interval < 
k < M iteratively as follows. The control value uq is determined by K Q (no observations 
are involved yet), and it is applied to the quantum system, which responds by selecting 
yi at random according to the distribution p{di\uq,ujq). This then determines the next 
state lo\ via (153). Next u\ is given by Xi(yi), and applied to the system. This process 
is repeated until the final time. 

The controller K therefore determines controlled stochastic processes u>k, u k and y k 
on the interval < k < M. Expectation with respect to the associated probability 
distribution is denoted Eff o . The state sequence u)k is a controlled Markov process. 

One way a controller K can be constructed is using a function 

u k = u(u k ,k) 

where Uk is given by (153) with initial state uq. This controller is denoted . The SME 
equation (153) forms part of this controller, viz. its dynamics, and must be implemented 
with suitable technology (e.g. digital computer). Controllers of this type are said to have 
a separation structure, where the controller can be decomposed into an estimation part 
(i.e. filtering via (153)) and a control part (i.e. the function u). The separation structure 
arises naturally from the dynamic programming techniques, as we shall see. 

Example 5.3 (Two-state system with feedback, Example 5.2 continued.) We consider a 
particular feedback controller K for a time horizon M = 2 defined by 



u = K = 0, Ul = Ki(y 1 ) = 
We apply K to the system with initial pure state 

l^o) = -^| - 1) + -^|1), or u 



if yi = 1 

1 if 2/i = -1. 



1 1 
1 1 



(160) 



(161) 



The result is shown in Table 1, which displays the resulting conditional states 



LU. 



(«0,2/l) 
1 

(i*o ,yi ),(«i, yi) 



LU 



= Ar(«o,2/iVo, 
= A r ( Ul ,yM U °' yi) 
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and the associated probabilities. Explicitly, the terms shown in Table 1 are: 

Pi =p(yi\uo,w ), p 2 = p(y 2 \u 1 ,uj 1 ) 

(o-i) _ ( (1 - a) \ (o,i) _ fa 
Ul -\ a)> Ul -\Q(l-a) 

(0 ,-!),(! -1) _ # ,(0,1),(0 -1) _iflO 



W 2 - ^2 - 2 I 1 



(0,-l),(l,l) _ (0,1),(0,1) 

Co>o — Co>n 



a 2 



" a 2 + (l-a) 2 V (1-«)V 
Also shown are the non-selective states: 

Po = 

pi = p(-1|m 0) ^o)^i 0,_1) +p( 1 K>^oVi°' 1) 
1 
1 



pa =„(-l|l,4°'- 1) )^'- 1M1 *- 1) 
+p(l|l,o;i '- 1 Vf" 1) ' (1 ' 1) 
+p(-l|0,o;! ' 1) )4 ' 1) ' (0 '" 1) 

+p(i|o,c! ' 1) )4°' 1) ' (0 ' 1) 



(162) 



i 



a 2 + a(l — a) 
2 ^ a(l -a) + (1 -a) 2 

At time /c = the control u = is applied. If yi = — 1 is observed, as a result of 
the imperfect measurement, the system moves to the state cc^ ' -1 ^. Since y\ = —1, the 
controller K (160) gives U\ = 1. This results in the states u4°' ^ or c^ ' 
depending on the outcome of the second measurement y 2 . If, on the other hand, y± — 1 is 
observed, the system moves to the state a^ 0,1 ^. Since y 1 = 1, the controller K (160) gives 

= 0, and hence oo^ ) ' 1 ^'' y0 ~ 1 ^ or a;^ ' 1 ^ ' 1 ^, again depending on the outcome of the second 
measurement y 2 . This is illustrated in Figure 12. 

Note that when a = (perfect measurements), the feedback system terminates in the 
desired pure state p 2 = |1)(1|, as discussed above. The role of feedback control is clearly 
demonstrated here. With imperfect measurements, < a < 1, the system terminates 
in the mixed state p 2 given by (162), with the degree of mixing (indicating the expected 
degradation in performance) depending on the measurement error probability parameter 
a: 

trp 2 = (a 2 + a(l - a)) 2 + (a(l - a) + (1 - a) 2 ) 2 
< 1 if < a < 1 
= 1 if a = 0. 

□ 
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Figure 12: Physical realization of the two stages of the two-state system with feedback 
using controller K. Due to the merging of the beams in the second stage, we have the 
intermediate state 6)2 — 1^4°' 1 ^ + |k4°' 1 ^°' ^ if Hz = — 1 (with probability 2a(l— a)), 
or Q 2 = - 1),(1>1) + |^ ' 1),(0,1) if y 2 = 1 (with probability a 2 + (1 - a) 2 ). 



5.4 Optimal Control 

In this section we summarize dynamic programming results for a well-known type of finite 
time horizon optimal control problem, [5, 38]. The optimal control problem discussed here 
can be considered to be a prototype problem illustrating measurement feedback in the 
quantum context. The dynamic programming methods used in this paper for solving the 
optimal control problems are standard. 

We define a cost function to be a no n- negative observable L(u) that can depend on 
the control u. The cost function encodes the designer's control objective. We also use a 
non-negative observable N to define a cost for the final state. 

Example 5.4 (Two-state system with feedback, Example 5.3 continued.) To set up the 
cost function L{u) to reflect our objective of regulating the system to the desired pure 
state |1), we define 

X=i(^-1.7)=(- 1 ° 

where A is the observable corresponding to the projective measurement (158). We note 
that the expected value of X 2 is 

(1|X 2 |1) =tr[X 2 |l)(l|] =0 
(-1|X 2 |-1) =tr[X 2 | - 1)(- 1|] =1 

which gives zero cost to the desired state, and nonzero cost to the undesired state. We 
shall also introduce a cost of control action, as follows: 



c(u) 



if u = 
p if u — 1 



where p > 0. This gives zero cost for doing nothing, and a nonzero cost for the flip 
operation. Thus we define the cost function to be 

L(u) = X 2 + c{u)I (163) 
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and the cost for the final state is defined to be 

N = X 2 . 

This modifies our earlier objective of putting the system into the desired state by including 
a penalty for control action. □ 

Let M > be a positive integer indicating a finite time interval k — 0, . . . , M. Given 
a sequence of control values mo,m-i — uq, . . . , um-i and measurements y± t M — Hi, ■ ■ ■ , Dm, 
define the risk-neutral cost functional 

M-l 

J U , (K) = E« [J2 (wi,L( Ui )) + (ou M ,N)}, (164) 

i=0 

where 0Ui, % — 0, . . . , M is the solution of the system dynamics (153) with initial state 
ujq = uj under the action of a controller K. This is an appropriate quantum generalization 
of the classical LQG type cost. The objective is to minimize this functional over all 
measurement feedback controllers K e /C. 

Following [5] it is convenient to rewrite the cost functional (164). For each k, given a se- 
quence of control values Uk,M-i — v>k, ■ ■ ■ , %-i and measurements yu+i,M = Vk+i, ■ ■ ■ , Dm, 
define a random sequence of observables Qk by the recursion ([5, equation (3.1)]) 

Q k = T\u k ,y k+1 )Q k+1 + L(u k ), < k < M - 1 
Qm=N (165) 

When useful, we write 

Qk = Qk(Uk,M-l,Uk+l,M) 

to indicate dependence on the input and outputs. Q k may be called a cost observable. 
The cost functional (164) is given by 

J u ,o(K)= ^ (^>Qo(^(2/i,m)o,m-i,2/i,m)) (166) 

Here and elsewhere we use abbreviations of the form 

K(y ltM )o,M-i = (-^0,-^1 (yi), • • • ,K M -i(yi,M-i)) 

Remark 5.5 The cost observable Q k given by (165) and the expression in (166) is anal- 
ogous to the familiar Heisenberg picture used in quantum physics. It is very natural from 
the point of view of dynamic programming, and indeed (164) and (166) are related by 
iterating (165). Here is the first step: 

(u ,Qo) = (u ,T"<(uo,yi)Qi + L(u )) 

= (u ,L(uo)) + (r(uo,j/i)wo,Qi) 
= (uj ,L(u )) + (wi,Qi)p(j/i|uo,w ) 

where lo\ = A r (u ,yi)uj and p(yi\u ,u ) is given by (152). □ 
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The key idea of dynamic programming is to look at the current state at a current time 
< k < M — 1 and to optimize the remaining cost from the current time to the final 
time. This leads to an iterative solution. Accordingly, we define, for each < k < M, 
the cost to go incurred by a controller K (restricted to k < I < M — 1) to be 

Ju,k{K) = ^2 (^,Qk(K(y k +i,M)k,M-^yk+i,M)) ( 167 ) 

S* + i,m6Y m -* 

The dynamic programming equation associated with this risk-neutral problem is 

V(u,k) = M{(u,L(u)) +J2v(A r (u,y)u,k + l)p(y\u,u)}, 

t^r (168) 

V(u,M) = (u,N) 

where < k < M — 1. This is the fundamental equation from which optimality or 
otherwise of a controller can be determined. 

Let V be the solution to the dynamic programming equation (168). Then for any 
controller K G /C we have 

V(u,k) < J u>k (K). (169) 
If we assume in addition that a minimizer 

u*(lu, k) e aigmm{(ij, L(u)) + ^ V(A T (u,y)uj,k + l)p(y\u,uj))} / 17 qn 
ueU y eY V ; 

exists 3 for all u, < k < M — 1, then the separation structure controller K£ defined by 
(170) is optimal, i.e. 

■UfiKl) = V(uo,0) < J^oiK) (171) 

for all KeK. 

Example 5.6 (Two-state system with feedback, Example 5.4 continued.) We solve the 
dynamic programming equation (168) and determine the optimal feedback controls as 
follows. For k = M = 2 we have 

V(u,2) = (lo,X 2 )=u 11 

and hence for k — 1 

V(u, 1) = u u + mm[V (cu, 1), V 1 (u, 1)] 
where where Vo(u>, 1), Vi(cu, 1) are given in Appendix 5.5. Hence we obtain 

u * (( , n _ /0 ifV (o;,l)<K(a;,l) 
u^ij-^ iiVo(u;A)>Vl(u; ,l). 

At time k = we have 

V(u, 0) = un + mm[V (uj, 0), V^cu, 0)] 



3 The notation argmin ueU f(u) means the subset of values from U minimizing /. 



71 



if V (u,0) < Vl(w,0) 

1 if V {u,0) > V^i(w,0). 



where Vo(u;, 0), Vi(u;, 0) are given in Appendix 5.5, which gives 

The optimal risk-neutral feedback controller is given by 

u = K£ fl = u> ,0), «i = = «*K 1) 

where u;i = Ar(«o, 2/iVo- Aoie £/ierf £/ie control u\ depends on y\ through the conditional 
state iO\ (separation structure). A physical implementation of the quantum system with 
optimal risk-neutral feedback is shown in Figure 13. 



filter 



LOq 



(0,1) 



control 



u*(wi,l) 



J/1 = 1 



til 



w 



M-Q 



-1 



J/1 = -1 



(0,-1) 

CJQ -» ^1 



filter 



Ul 



<-u*(wi,l) 



control 



M-Q 



-1 



^2 P2 

physical system 



Figure 13: Physical realization of the two stages of the two-state system with feedback 
using the optimal risk-neutral controller fT"* (with c^o given by (161), we have Mo = 
u*(cj ,0) = 0, ui = u*(cji, 1)). 



Let's consider the special case a = and p — 0, with initial state (161). We then find 
that V (u , 0) = Vi(a'O) 0) = 0.5, and hence we take u*(lu , 0) = 0; i.e. u = 0. 

Next, if y 1 = —1 is observed, we have ui = | — 1)( — 1|, V (u 1: 1) = 1 and Vi(^i, 1) = 0. 
Hence we take u*(u;i,l) = 1, i.e. u\ = 1. However, if y\ = 1 is observed, we have 
ui = |1)(1|, Vo(cui, 1) = and Vi(ui, 1) = 1; and hence we take u*(u;i, 1) = 0, i.e. u\ = 0. 
In either case we achieve the desired state p 2 = u 2 = |1)(1|- 

This action is the same as that seen before for the controller K. The same controller 
is obtained for < a < 0.5 and p — 0, but u 2 will be a mixed state. If p ^ the optimal 
controller K™* will result in control actions that in general differ from those of K. □ 
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input phySiCal SyStCm output 
state Wfc 

cqn. (153) y 



control 
u*(co kl k) 



filter 
state Uk 
cqn. (153 



feedback controller if" 



Figure 14: Optimal risk-neutral controller K£ showing separation structure and states 
of the physical system uj k and filter 



5.5 Appendix: Formulas for the Two- State System with Feed- 
back Example 

The following quantities were used in the solution of the risk-neutral problem, Example 
5.6: 

V (U, 1) = UJu, Vl(u, 1) = UJ 2 2 +P 

Vq{u, 0) = uou + min[aa;ii,p + oo 2 2 - auo 22 ] 
-r-min^n - auj n ,P + omj 22 ] 

V]_(u, 0) = p + au u + u 22 - ctu 2 2 

+ min[o;a;ii, p + oo 22 - auj 22 ] 
+ min [p + auu , oo 22 - ocuj 22 ] 
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6 Optimal Risk- Sensitive Feedback Control of Quan- 
tum Systems 

References for this section include [36], [7], [12], [41], [43], [34], [13]. 
6.1 System Model 

We consider the problem of controlling an open quantum system S (e.g. an atom) that 
interacts with an electromagnetic field B. We suppose that the evolution can be influenced 
by control variables u that enter the system Hamiltonian H(u). The field B is continuously 
monitored, providing weak measurements of the system S, the results y of which are 
available to the controller K, which processes this information to produce the control 
actions u. The control in general is allowed to be a causal function of the measurement 
trajectory. The problem is to find a controller K so that it minimizes a risk-sensitive cost 
function J M (K), which will be defined below. 

In order to describe the quantum model for this problem, we need to consider how sys- 
tem variables evolve with time. System variables (e.g. position, momentum) are operators 
X defined on the Hilbert space underlying the system S. The evolution is determined by 
interaction with the electromagnetic field, the influence of which is modelled by quantum 
white noise. The quantum equations analogous to classical models (e.g. (108), (109)) are 

dX(t) = (-X(t)K(t)- K\t)X{t) + M\t)X{t)M{t))dt 
+ [X(t),M(t)}dB\t) - [X(t),M\t)]dB(t) 

and 

dY(t) = (M(t) + M\t))dt + dQ{t). (172) 

We now explain these equations which use the framework of quantum stochastic differen- 
tial equations, [28], [43]. 

Let u = u(-) be a control signal (a function of time t with values u(t) e U). The 
quantum model we use considers the system plus field as a total closed system with 
unitary operator U(t) = U u {t) (interaction picture), which solves the quantum stochastic 
differential equation (QSDE) [28, eq. (11.2.7)], [43, sec. 26], 

dU(t) = {-K{u{t))dt + MdB\t) - M^dB{t)}U{t) (173) 

with initial condition U(0) = I, where 

K(u) = \.H{u) + \m^M. 

ill Zi 

Here, M is a system operator which together with the field operator b(t) = B(t), model 
the interaction of the system with the channel. We denote adjoints with the symbol t. 
Note that equation (173) is written in Ito form (see, e.g. [27, Chapter 4]), as will all 
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stochastic differential equations in these notes. With vacuum initialization of the field 
channel, the non-zero Ito product is, [28, eq. (11.2.6)], dB{t)dB\t) = dt. Then system 
operators X evolve according to 

X(t)=j t (u,X) = tf(t)XU(t), 

and satisfy the quantum Langevin equation (QLE) (172), where M(t) = j t (u,M), and 
K(t)= 3t (u,K(u(t))). 

Let v denote the vacuum state for the field, and let ttq be an arbitrary initial system 
state. The initial state of the total system is po = <8> vv^. We write, for operators A 
and B, 

(A, B) = ti[A^B]. 

We regard the field operator B(t) as an input field [28, Section 11.3.2], with corre- 
sponding output field A(t) = j t (u,B(t)). The self adjoint real quadratures of the input 
field is defined by Q(t) = B(t) + B^(t). For the output field real quadrature we write 
Y(t) = jt{u,Q(t)), also self adjoint. This process satisfies the QSDE (172). The input 
quadrature is self commutative [Q{t),Q(s)] = for all s,t, and corresponds to quan- 
tum Brownian motion when the field is initialized in the vacuum state. Indeed, if 
denotes the set of all Wiener paths, the probability of a subset F C Vl T of paths is 
P°(F) = (vv^ , (F)) , where P®{F) is the projection operator associated with F and 
Q{s), < s < T. The probability distribution P° is the Wiener distribution, under which 
the increments q(t) —q(s), < s < t < T, are independent, Gaussian, with zero mean and 
covariance (t — s). The output fields Y(t) are also self commutative, and define a proba- 
bility measure P by P(F) = (p , Prf (F)) , where P^(F) is the corresponding projection 
operator. 

The input and output fields satisfy the non-demolition conditions [Q(t),X] = 0, for 
all t > 0, and [F(s), X(t)] = for all s < t. This means that we can continuously monitor 
the output field Y(t) without demolishing the system, say by homodyne detection. The 
results of the measurement is a real record y(-) G fir, which is used by a (classical) 
controller K to produce the input control signal u(t) by u(t) = K(t, y[o,t])- The notation 
used here is meant to indicate the causal dependence of the control on the measurements; 
y\o t t] indicates the segment of the measurement signal on the time interval [0, t], so in effect 
the controller K = {K(£, •)} is a family of functions. The measurement record is given by 
the SDE 

dy(t) = tr[(M + M*)n t ]dt + dw(t), (174) 
where w(t) is a Wiener process under P and 

dn t = -UH(u(t)),ir t ]dt + V[M]7i t dt +H[M]TT t dw(t), (175) 

where T>[c]p = cpc^ — \{c^cp + pc^c), and Ti,[c]p = cp + pS — ptr(cp + prf). Equation (175) 
stochastic master equation or Belavkin quantum filtering equation (e.g. [34, Chapter 
5.2.5]). 
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In unnormalised form, the Belavkin filter is 

do t = (-K(u(t))a t - a t K\u{t)) + Ma t M^)dt + (Ma t + <r t M*)dy(t). (176) 

Here, y is a standard Wiener process under the reference distribution P°. The normalized 
conditional state an be recovered by 

nt = zrvv (177) 
\°t, i) 



6.2 Risk-Neutral Optimal Control 

In this section we briefly discuss a risk-neutral problem of the type that has been studied 
by [5], [6], [18], [13]. 

Let Ci(u) be a non-negative self-adjoint system operator depending on the control 
value u, and let C 2 be a non-negative self-adjoint system operator. These so-called cost 
operators are chosen to reflect the performance objectives, and explicitly include the 
control so that a balance between performance and control cost can be achieved. The 
quantity 

rT 

Ci(*)dt + C 2 (T), (178) 

'o 

where C\(t) = jt(u,C\(u(t)), C 2 (t) = jt(u, C 2 ), accumulates cost over the given time 
interval and provides a penalty for the final time. The risk-neutral problem defined by 
the quantum expectation 



Jo 



J(K) = (p , f C^dt + C 2 (T)), (179) 
Jo 

where p — n o <H> v <E> v^. 

The key step in solving the optimal control problem specified by (179) is a stochastic 
representation followed by classical conditional expectation, which results in (recall section 
3.5.1) 

J(K) = E[ [ T (7r t , CMt)))dt + (tt t , C 2 )\ 
Jo 

= E°[/ ( ( 7 t ,C 1 Ki)))^ + ( ( 7 T ,C 2 )] (180) 
Jo 

where 7r t and a t are the conditional states, assuming interchanges of expectations and 
integrals are justified. 

Dynamic programming. The risk-neutral value function is defined by 



W(a, t) = inf E°j£ T (a s , C^ds + (a T , C 2 )\ 



(181) 
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and the corresponding dynamic programming equation reads 

gW(a,t)+ inf {C u W(a,t) + CAu)} = 0, < t < T, 
dt «eu L (182) 

W(a,T) =(a,C 2 ). 

We explain the meaning of the operator C u following classical stochastic control meth- 
ods [23]. For a fixed constant control value u (i.e. u(t) =mGU for all t), a t is a Markov 
process with generator C u , which is defined, when it exists, by 

C«f(*) = l^ Kfi[f{a f- m (183) 

for suitably smooth functions /(•). In fact, C u f(a) can be calculated explicitly for / of 
the form 

f(a)=g((a,X 1 ),...,(a,X J )), (184) 

where g is a smooth bounded function of vectors of length J, and Xi, . . . , Xj are system 
operators. It is given by 



C u f(a) = (185) 
1 J 

- 9jk((<r, X,),..., (a, Xj))(a, M^X j + XjM) (a, MX, + X k M) 



j,k=i 
j 

+ Y,9j((<r,Xi), (<r,Xj))(<r, -K(u)Xj - XjK(u) + M^X 3 M) 
j'=i 

for functions / of the form (184). Here gj and gjk denote first and second order partial 
derivatives of g. 

If the dynamic programming equation (182) has a sufficiently smooth solution W(a, t), 
then the optimal controller K* is given by 

do t = (-K(u(t))a t - <J t K\u(t)) + Ma t M^)dt + (Ma t + a t M^)dy(t) 
■ u(t)=u*(o?,t). (186) 

where u*(cr, t) attains the minimum in (182). The dynamical part of this controller is the 
Belavkin quantum filter, (176). 

6.3 Risk- Sensitive Optimal Control 

Instead of using the expected value of the quantity (178) as a cost function as in (179), 
we consider the average of the exponential of (178) in the following way, since we wish to 
generalize the LEQG cost (115). Define R(t) to be the solution of the operator differential 
equation 

^ = f CWJKt) (187) 
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with initial condition R(0) = I. Here, \x > is a positive (risk) parameter. The solution 
of (187) can be expressed as the time-ordered exponential 



R(t) =exp ^| d(s)ds 



We then define the risk-sensitive cost function to be the quantum expectation 

J"(K) = ( Po ,R j (T)e^ T) R(T)}. (188) 

Here, po = ® fi^, as above. 

It is shown in [36] that the quantum expectation can be replaced by an equivalent 
classical expectation, viz. 

J"(K) = E°[(a^e^)], (189) 

where the unnormalized state a? (a density operator, or information state) is the solution 
of the SDE 

da? = (-K»(u(t))a? - <r?Krt(u(t)) + Ma?M*)dt + (Mof + a?M r )dy 2 {t), (190) 

or 

dof = -^[#(u(*)),^]^ + P[M]CTf^ + ^ (191) 

where H[c]p = cp+pc\ and K^(u) = K(u) —p\Ci(u). The initial condition is 0^(0) = n . 
The expression (189) is similar to the classical forms [8, eq. (3.4)], [37, eq. (2.10)]. Another 
useful representation is given in terms of the following normalized risk-sensitive state 

7TT " 



namely 



J"(K) = E^[exp(/i I tr(C 1 (M(t))7rf)rft)(7r^,e' lC2 )] (192) 
Jo 

where E' 1 denotes expectation with respect to the probability distribution P M defined by 
dP» = A£dP°, where 



A£ = exp(-l I \ix[(M + M^]\ 2 dt + I tr[(M + M*)i$\dy(t)). 
Jo Jo 

The SDE satisfied by 7if is 

di$ = ~[H(u(t)),i$]dt + V[M\i$dt + ^W[Ci(u(*))]7tfcft + ft[MKdw"(t), (193) 

where w M (t) is a standard Wiener process with respect to defined by 

dy(t) = tr[(M + M^]dt + dw^t). (194) 
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Dynamic programming. Define the risk-sensitive value function S IJ/ (a,t) for an 
arbitrary initial unnormalized state a and initial time < t < T by 

^(a ) t) = infE° i JK,e^)], (195) 

where denotes the solution of (190) at time T with initial condition = a (we have 
made explicit the dependence on the initial state and time in the expectation notation). 
Note that the cost (188) is given by 

J^(K) = E° Oi0 [K,e^)] 

so that the optimal controller K M '* is determined by 

jm( K ^) = ^(tt o ,0). 

The method of dynamic programming in this context relates the value function at 
time t and at a later time t < s < T along optimal trajectories via the relation 

5"(<7,t)=iME° t [5"«, a )]. (196) 

This is the principle of optimality, [24, Chapter VI], [38, Chapter 6]. Note that by 
definition S ,x (a, T) = (a, e^° 2 ). The dynamic programming equation is 



d 



S»(a, t) + inf C^ u S"(a, t) = 0, < t < T, 



01 v ' ' <<eu v ' ~ (197) 

Note that in the dynamic programming PDE (197), the minimization is over the con- 
trol values u, whereas in the definitions of the cost (188) and value function (195) the 
minimizations are over the controllers K. 

For fixed u G U and for / of the form (184), we have 



C" u f(<j) = ^^g jk ((<j,X 1 ),... : {<j : Xj))(<j,M*X j + X j M){a,M j <X k + X k M) 
j,k=i 
j 

+ 5^<&«<7,Xi), . . . , (a,Xj))(a, -K»(u)Xj - X 3 K»{u) + M^MX198) 
j=i 

If the dynamic programming PDE has a sufficiently smooth solution S^(a, t), then the 
optimal controller K M '* can be obtained as follows. Let u^'*(<t, t) denote the control value 
that attains the minimum in (197) for each a, t. The optimal controller is obtained by 
combining this function with the dynamics (190): 

™* - d(J t = (-K"( U (t))o? - o?Krt(u(t)) + M<j?M*)dt + (Ma? + <r?M*)dy(t) 
■ u(t) =u»*W,t). 

(199) 
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6.4 Control of a Two Level Atom 



In this section we consider the application of the risk-sensitive control problem to the 
example studied in [13], namely the feedback control of a two- level atom using a laser. 

6.4.1 Setup 

The amplitude and phase of the input laser can be adjusted, so via the interaction with 
the laser the atom can be controlled. The real quadrature of a second field channel is 
continuously monitored, say by homodyne detection, providing an indirect measurement 
of the atom. The control input is complex, u = u r + iU{ = \u\e iar & u G C (the control field 
channel becomes a coherent state corresponding to u). The measurement signal i/2(t) is 
real. It is desired to regulate the system in the a z up state | j ) = (1, 0) T (the down state 

is | | ) = (0, 1) T , and a x = ^ J ^ , a y = ^ ? ^ ^ and a z = ^ J _^ ^ are the Pauli 

matrices). 

To model this example we use an additional unmeasured channel which interacts with 
the atom via an operator L. The corresponding risk-sensitive filter is [36] 

da» = {-K»{u{t))<j>t - a$Krt{u(t)) + La? L f + Ma?M r )dt 

+(Ma? + a?M j <)dy 2 (t), (200) 

where K» = \H{u) + \M^M + \Dh - %C x {u). 

In terms of the notation used in these notes, we have 

L = Kftj_, M = /t s <r_, H(u) = i(u*L — uL^), 

K 2 f + K 2 g = l, (T- = K j] 



u = c, ^H = «(o?) + ^l 2 (J?) 

C2 = c( ° ? ] , a>0,6>0,c>0. 



x l , 

Here, k 2 and /t^ are the decay rates into the control and measurement channels. The 
parameters a, b and c are weights for the components of the cost. Note that ( j |Ci(w)| j 
) > and ( | |C 2 | | ) > (if a > and c > 0), while ( | |Ci(0)| | ) = and 
( T IC2I T ) = 0, reflecting the control objective. 

6.4.2 Information State 

We use the framework described in previous sections to solve the optimal risk-sensitive 
control problem. Since the second (w-dependent) part of Ci(u) is proportional to the iden- 
tity, that part commutes with all operators and it is convenient to factor its contribution 



80 



to the risk-sensitive state by writing 

°t = \ ( n (t)I + x (t)&x + y(t)o- y + z(t)a z ) exp ^{ib J \u(s)\ 2 ds 



x(t)+iy(t) n(t)-z(t) J 

Then substitution into the SDE (190) shows that the coefficients satisfy the SDEs 

dn(t) = \na{n{t) - z{t))dt + K s x{t)dy 2 {t) (201) 
dx(t) = -|(1 - fia)x(t)dt + 2KfUr(t)z(t)dt 

+n s {n{t) + z(t))dy 2 (t) 
dy(t) = -|(1- fia)y(t)dt - 2KfUi(t)z(t)dt 
dz(t) = -(1 - \na)z(t)dt - (1 + \na)n(t)dt 

—2Kf(u r (t)x(t) - Ui(t)y(t))dt - K s x(t)dy 2 {t). 

The representation (189) reads 

J"(K) = E°[exp ^ jf %(s)| 2 c^ ±(ra(T) - ^T))^]. (202) 



6.4.3 Dynamic Programming 

We consider the value function (195) as a function of the coefficients, i.e. S^(n, x, y, z, t). 
In terms of these parameters, the dynamic programming equation is 

|^(n, x, y, z, t) + inf {C^S^n, x, y, z, t) + ±[ib\u\ 2 S»(n, x,y,z,t)} = 0, < t < T, 
SHn,x,y,z,T) = \{n-z)e» c , 

(203) 

where the operator is given, for sufficiently smooth functions f(n,z,y,z), by 

f = 2^ S X f nn + -^K s {n + z) f xx + ^K S X f zz 

+K 2 s x(n + z)f nx - n 2 x 2 f nz - K 2 s (n + z)xf xz 
+fn{\^a(n - z)) + / x (-|(l - A*a)x + 2K f u r z) 

+fy(~U 1 ~ ~ Z^fUiz) 
+f z (—(l - \na)z - (1 + |/xa)n 
-2K f (u r x - Uiy). 

Here, the subscripts / nx , etc, refer to partial derivatives, and the arguments n, x, y, z have 
been omitted. 

To construct the optimal risk-sensitive controller K M '*, we suppose that (203) has a 
smooth solution, which we write as 

S^(n,x,y,z,t) = nexp (-W(n, x, y, z, t)) . (204) 
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The minimum over u in (203) can be explicitly evaluated by setting the derivatives of the 
expression in the parentheses (it is concave) with respect to u r and Ui to zero. The result 
is 



2k f 

u?'*(n,x,y,z,t) = -j^(xW£(n,x,y,z,t) - zW£(n,x,y,z,t)) 

u?*{n,x,y,z,t) = -^(zW£(n,x,y,z,t) - yW?{n,x,y,z,t)). (205) 

The optimal risk-sensitive controller is then 

K"'* : u(t) = u^(n(t),x(t),y(t), z(t),t) + iu^(n(t),x(t),y(t), z(t),t), (206) 

where n(t), x(t), y(t) and z(t) are given by (201). 

Note that the dynamic programming equation (203) (which is a partial differential 
equation of parabolic type) is solved backwards in time, using the terminal condition 
specified: S^(n, x,y, z,T) = \{n — z)e^ c . The infimum in (203) can be removed by 
substituting in the optimal control values given by the explicit formulas (205), if desired. 
However, the form (203) is better suited to numerical computation, since the optimal 
control structure is preserved, [40]. Note that in this example, the risk-sensitive filter 
(190) is replaced by the finite-dimensional SDE (201); this fact is important for practical 
computational reasons. 

6.4.4 Risk-Neutral Control 

For comparison, consider the risk-neutral cost for this problem. Write 

a t = \ (n(t)J + x(t)a x + y{t)o y + z{t)a z ) . (207) 

Then from the SDE (176), we find that 

dn(t) = K s x{t)dy 2 {t) (208) 
dx(t) = -\x(t)dt + 2K f u r (t)z(t)dt + K s (n(t) + z(t))dy 2 {t) 
dy(t) = —\y{t)dt — 2KfUi{t)z{t)dt 

dz(t) = —z(t)dt — n(t)dt — 2Kf(u r (t)x(t) — Ui(t)y(t))dt — K s x(t)dy2(t). 
The risk-neutral representation (180) becomes 

J(K) = E°[| J {a{n{t) - z(t) + b\u{t)\ 2 )dt 

+|(n(T) - z(T))c], (209) 
and the dynamic programming equation is 



dt 



W(n,x,y,z,t) + inf {C u W{n,x,y, z, t)\{a{n - z) + %| 2 )} = 0, < t < T, 



uec 



W(n,x,y,z,T) = \(n - z)e c , 
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(210) 



where 



£ / ~~ 2 K s X fnn + 2 K s( n z ) fxx + 2 K s X fzz 

+n 2 x(n + z)fnx - n 2 s x 2 f nz - n 2 (n + z)xf s 

+ fy{~\y ~2KfUiZ) 

+f z (—z -n - 2K f (u r x - Uiy) 



xz 



Evaluating the minimum in (210) gives 




(xW z (n,x,y,z,t) - zW x (n,x,y, z,t)) 



(zW y (n, x, y, z, t) - yW z (n, x, y, z, t)), 



(211) 



K* : u(t) = u*(n(t),x(t),y(t),z(t),t)+ i u*(n(t),x(t),y(t),z(t),t), (212) 

where n(t), x(t), y(t) and z(t) are given by (208). Note that normalization of (208) results 
in [13, eq. (7)]. 



Note that the expressions for the both the risk-sensitive and risk-neutral controllers 
are similar, and involve a similar level of complexity for implementation. When a = 0, 
the risk-sensitive SDEs (201) reduces to the risk-neutral or standard SDEs (208), though 
the controllers will be different in general. 

6.5 Control of a Trapped Atom 
6.5.1 Setup 

We now apply risk-sensitive optimal control to the problem of cooling and confinement 
of a quantum mechanical oscillator undergoing position measurement considered in the 
context of the linear quadratic Gaussian optimal control problem in [18]. As discussed 
there, this model corresponds in an appropriate limit to an atom trapped in an optical 
cavity, where the atom imparts a phase shift on the light that depends on its position and 
may be detected outside the cavity, see Figure 15. 

In suitable coordinates, it is desired to keep the position q near zero (confinement), 
and the momentum p also near zero (cooling). A homodyne detection scheme on the light 
reflected from the cavity (assumed perfect) provides information about the position q. 
The Hamiltonian and measurement operators are 



Here, q and p are considered as operators. The mass of the atom is m, and u is the angular 
frequency of the harmonic motion in of the atom in its confining potential. The parameter 



H = 




83 




K 



Figure 15: Atom in cavity feedback control. 

k describes the sensitivity of the resulting measurement of position, larger values of k 
corresponding to better measurement sensitivity. This parameter can in principle be 
tuned by varying the power of the laser driving the cavity. The control input is u = 
(ui, U2) , with real coefficients bi,b 2 . The cost observables C% and C 2 are given by C\(u) = 
^(q,p)P(q,p) T + \u T Qu ) C 2 = 0, where P and Q are positive definite symmetric matrices. 

6.5.2 Information State 

It is shown in [18] that the conditional state 7r t is Gaussian if 7To is Gaussian. It also turns 
out that the risk-sensitive states and tt^ are Gaussian, as can be shown by lengthy 
calculations. Indeed, the mean and covariance parameters of 7rf are given by 

and 

= tr ^<] " ( tr trf) 2 , Y» = tr[p 2 ^] - (trtpvrf]) 2 , 

Y gp = tr t(^ + P^t] ~ tr[g?if ]tr[p?if ]. 

It is important to realize that these are not the conditional means and covariances for 
position and momentum, which correspond to the conditional state 7r t and are given by 
[18, eqs. (55), (48)-(50)]. 

The differential equations satisfied by these parameters are (when P12 = 0) 

dq? = {f-jm + liiPuYfqH + P 22 ^>1 + hu)dt + 2Y£dw» 

dp» = (-muo 2 q^ + b 2 u 2 )dt + fjL[P n Y£<f + P 22 ^V] + TY^dvf (213) 

and 

Y» = (2/m)Y£ - 8k(Yf) 2 + »[Pn(Yf) 2 + P 22 (>^) 2 - (M 2 /4)P 22 ] 
Yp = -2mtu 2 Yf p - 8k(Y£f + 2kh 2 + ^[P 22 (F/) 2 + Pn(Y£ p ) 2 } - (^ 2 /4)P n 
= Yf/m - mu 2 Y» - 8kY£Yf + »[Y£(P n Yf + P 22 F/)]. 
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The initial conditions are g"(0) = q , jf(0) = p , Yf(0) = Y q0 , Yf(0) = Y p0 , and Y£{0) 
Y qp0 , where qo,po,Y q0 ,Y p0 and Y qp0 are the Gaussian parameters for 7r . 
To facilitate more compact and general notation, we write 

Then the above equations take the form 

dx» = ([A + {iY»P]x^ + Bu)dt + Y^H T (dy - Hx^dt) (214) 
where dw^ = dy — Hx^dt and 

Y» = AY^ + Y»A T - Y»[H T H - fiP]Y^ + GG T - (/in 2 /4)SPS T (215) 

where 

/ 1.//A p 

-mu 2 7' V b 2 



l/m\ ( h 

mw 2 j' V 6 2 

G =(°),7/ = 2v^(10),E=(_° 1 J) 



6.5.3 Optimal LEQG Control 

We consider now how the optimal control is determined. In terms of the Gaussian pa- 
rameters, the cost representation (192) takes the form (omitting some immaterial terms) 

J"(K) = E^[exp(^ / (x* T (t)Px*(t)+u T (t)Qu(t))dt)] (216) 
2 Jo 

Consequently, the problem becomes a standard state feedback LEQG problem for the 
system (214), (215). The solution is [8], [49], [51]: 

u*(t) = -Q^B T X^(t)[I - fiY^X^t)]- 1 ^), (217) 

where 

X» + A T X fM + XM - X»[BQ- l B T - fi(GG T - (nh 2 /A))EPT, T }X" + P = 0. (218) 

It is important to note that this solution (217) differs from the classical LEQG solution 
(117) via the \xf? terms. 

6.5.4 Robustness 

Robustness of the QLEQG controller has been investigated in [51]. It can be seen from 
the simulation results in Figure 16 that the QLEQG controller provides a more graceful 
decline in performance (increase in average cost) than the LQG control controller as the 
nominal model/phyiscal model discrepancy, (3, increases. This is typical of the robust 
properties of a general LEQG controller, recall section 4.4. 
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Figure 16: LQG vs. QLEQG (sample graph) 
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